Text Data Preparation
Import text data into MATLAB®and preprocess it for analysis
Text Analytics Toolbox™ includes tools for processing raw text from sources such as equipment logs, news feeds, surveys, operator reports, and social media. Use these tools to extract text from popular file formats, preprocess raw text, extract individual words or multiword phrases (n-grams), convert text into numerical representations, and build statistical models. For an example showing how to get started, seePrepare Text Data for Analysis.
Text Analytics Toolbox supports the languages English, Japanese, German, and Korean. Most Text Analytics Toolbox functions work with text from other languages. For more information, seeLanguage Considerations.
Live Editor Tasks
Preprocess Text Data | Preprocess and clean up text data for analysis |
Functions
Topics
Import
- Extract Text Data from Files
This example shows how to extract the text data from text, HTML, Microsoft® Word, PDF, CSV, and Microsoft Excel® files and import it into MATLAB® for analysis. - Parse HTML and Extract Text Content
This example shows how to parse HTML code and extract the text content from particular elements. - Data Sets for Text Analytics
Discover data sets for various text analytics tasks.
Preprocessing
- Preprocess Text Data in Live Editor
Explore text preprocessing techniques using thePreprocess Text DataLive Editor task. - Prepare Text Data for Analysis
This example shows how to create a function which cleans and preprocesses text data for analysis. - Analyze Text Data Containing Emojis
This example shows how to analyze text data containing emojis. - Correct Spelling in Documents
This example shows how to correct spelling in documents using Hunspell. - Create Extension Dictionary for Spelling Correction
This example shows how to create a Hunspell extension dictionary for spelling correction. - Create Custom Spelling Correction Function Using Edit Distance Searchers
This example shows how to correct spelling using edit distance searchers and a vocabulary of known words. - Analyze Sentence Structure Using Grammatical Dependency Parsing
这个例子展示了如何提取信息a sentence using grammatical dependency parsing.
Language Support
- Language Considerations
Information on using Text Analytics Toolbox features for other languages. - Japanese Language Support
Information on Japanese support in Text Analytics Toolbox. - Analyze Japanese Text Data
This example shows how to import, prepare, and analyze Japanese text data using a topic model. - German Language Support
Information on German support in Text Analytics Toolbox. - Analyze German Text Data
This example shows how to import, prepare, and analyze German text data using a topic model.