Main Content

removeShortWords

Remove short words from documents or bag-of-words model

Description

example

newDocuments= removeShortWords(documents,len)removes words of lengthlenor less fromdocuments.

example

newBag= removeShortWords(,len)removes words of lengthlenor less from the袋OfWordsobject.

Examples

collapse all

Remove the words with two or fewer characters from a document.

document = tokenizedDocument("An example of a short sentence"); newDocument = removeShortWords(document,2)
newDocument = tokenizedDocument: 3 tokens: example short sentence

Remove the words with two or fewer characters from a bag-of-words model.

documents = tokenizedDocument([..."an example of a short sentence""a second short sentence"]); bag = bagOfWords(documents); newBag = removeShortWords(bag,2)
newBag = bagOfWords with properties: Counts: [2x4 double] Vocabulary: ["example" "short" "sentence" "second"] NumWords: 4 NumDocuments: 2

Input Arguments

collapse all

Input documents, specified as atokenizedDocumentarray.

Input bag-of-words model, specified as a袋OfWordsobject.

Maximum length of words to remove, specified as a positive integer. The function removes words withlenor fewer characters.

Output Arguments

collapse all

Output documents, returned as atokenizedDocumentarray.

Output bag-of-words model, returned as a袋OfWordsobject.

Introduced in R2017b