Main Content

Modeling and Prediction

Develop predictive models using topic models and word embeddings

从high-di发现集群和提取功能mensional text datasets, you can use machine learning techniques and models such as LSA, LDA, and word embeddings. You can combine features created with Text Analytics Toolbox™ with features from other data sources. With these features, you can build machine learning models that take advantage of textual, numeric, and other types of data.

Functions

expand all

bagOfWords Bag-of-words model
bagOfNgrams Bag-of-n-grams model
addDocument Add documents to bag-of-words or bag-of-n-grams model
removeDocument Remove documents from bag-of-words or bag-of-n-grams model
removeInfrequentWords Remove words with low counts from bag-of-words model
removeInfrequentNgrams Remove infrequently seen n-grams from bag-of-n-grams model
removeWords Remove selected words from documents or bag-of-words model
removeNgrams Remove n-grams from bag-of-n-grams model
removeEmptyDocuments Remove empty documents from tokenized document array, bag-of-words model, or bag-of-n-grams model
topkwords Most important words in bag-of-words model or LDA topic
topkngrams Most frequent n-grams
encode Encode documents as matrix of word or n-gram counts
tfidf Term Frequency–Inverse Document Frequency (tf-idf) matrix
join Combine multiple bag-of-words or bag-of-n-grams models
vaderSentimentScores Sentiment scores with VADER algorithm
ratioSentimentScores Sentiment scores with ratio rule
fastTextWordEmbedding Pretrained fastText word embedding
wordEncoding Word encoding model to map words to indices and back
doc2sequence Convert documents to sequences for deep learning
wordEmbeddingLayer Word embedding layer for deep learning networks
word2vec Map word to embedding vector
word2ind Map word to encoding index
vec2word Map embedding vector to word
ind2word Map encoding index to word
isVocabularyWord Test if word is member of word embedding or encoding
readWordEmbedding Read word embedding from file
trainWordEmbedding Train word embedding
writeWordEmbedding Write word embedding file
wordEmbedding Word embedding model to map words to vectors and back
extractSummary Extract summary from documents
rakeKeywords Extract keywords using RAKE
textrankKeywords Extract keywords using TextRank
bleuEvaluationScore Evaluate translation or summarization with BLEU similarity score
rougeEvaluationScore Evaluate translation or summarization with ROUGE similarity score
bm25Similarity Document similarities with BM25 algorithm
cosineSimilarity Document similarities with cosine similarity
textrankScores Document scoring with TextRank algorithm
lexrankScores Document scoring with LexRank algorithm
mmrScores Document scoring with Maximal Marginal Relevance (MMR) algorithm
fitlda 适合潜在狄利克雷分配(LDA)模式l
fitlsa Fit LSA model
resume Resume fitting LDA model
logp Document log-probabilities and goodness of fit of LDA model
predict Predict top LDA topics of documents
transform Transform documents into lower-dimensional space
ldaModel Latent Dirichlet allocation (LDA) model
lsaModel Latent semantic analysis (LSA) model
addEntityDetails Add entity tags to documents
trainHMMEntityModel Train HMM-based model for named entity recognition (NER)
predict Predict entities using named entity recognition (NER) model
hmmEntityModel HMM-based model for named entity recognition (NER)
wordcloud Create word cloud chart from text, bag-of-words model, bag-of-n-grams model, or LDA model
textscatter 2-D scatter plot of text
textscatter3 3-D scatter plot of text

Topics

Classification and Modeling

Sentiment Analysis and Keyword Extraction

Deep Learning

Language Support