Main Content

removeInfrequentWords

Remove words with low counts from bag-of-words model

Description

example

newBag= removeInfrequentWords(,count)removes the words that appear at mostcounttimes in total from the bag-of-words model. The function, by default, is case sensitive.

example

newBag= removeInfrequentWords(,count,'IgnoreCase',true)removes the words that appear at mostcounttimes in total ignoring case. If words differ only by case, then the corresponding counts are merged.

Examples

collapse all

Remove the words that appear two times or fewer from a bag-of-words model.

Create a bag-of-words model from an array of tokenized documents.

documents = tokenizedDocument(["an example of a short sentence""a second short sentence""another example""a short example"]); bag = bagOfWords(documents)
袋= bagOfWords with properties: Counts: [4x8 double] Vocabulary: ["an" "example" "of" "a" "short" ... ] NumWords: 8 NumDocuments: 4

Remove the words that appear two times or fewer from the bag-of-words model.

count = 2; newBag = removeInfrequentWords(bag,count)
newBag = bagOfWords with properties: Counts: [4x3 double] Vocabulary: ["example" "a" "short"] NumWords: 3 NumDocuments: 4

Input Arguments

collapse all

Input bag-of-words model, specified as a袋OfWordsobject.

Count threshold to remove words, specified as a positive integer. The function removes the words that appearcounttimes in total or fewer.

版本历史

Introduced in R2017b