Main Content

removeDocument

Remove documents from bag-of-words or bag-of-n-grams model

Description

example

newBag= removeDocument(,idx)removes the documents with indices specified byidxfrom the bag-of-words or bag-of-n-grams model. If the removed documents contain words or n-grams that do not appear in the remaining documents, then the function also removes these words or n-grams from.

Examples

collapse all

Remove selected documents from a bag-of-words model.

documents = tokenizedDocument([..."an example of a short sentence""a second short sentence""a third example""a final sentence"]); bag = bagOfWords(documents)
袋= bagOfWords with properties: Counts: [4x9 double] Vocabulary: ["an" "example" "of" "a" "short" ... ] NumWords: 9 NumDocuments: 4

Remove the first and third documents from.

idx = [1 3]; newBag = removeDocument(bag,idx)
newBag = bagOfWords with properties: Counts: [2x5 double] Vocabulary: ["a" "short" "sentence" "second" "final"] NumWords: 5 NumDocuments: 2

Remove the same documents using logical indices.

idx = logical([1 0 1 0]); newBag = removeDocument(bag,idx)
newBag = bagOfWords with properties: Counts: [2x5 double] Vocabulary: ["a" "short" "sentence" "second" "final"] NumWords: 5 NumDocuments: 2

Input Arguments

collapse all

Input bag-of-words or bag-of-n-grams model, specified as a袋OfWordsobject or a袋OfNgramsobject.

Indices of documents to remove, specified as a vector of numeric indices or a vector of logical indices.

Example:[2 4 6]

Example:[0 1 0 1 0 1]

Output Arguments

collapse all

Output model, returned as a袋OfWordsobject or a袋OfNgramsobject. The type ofnewBagis the same as the type of.

版本历史

Introduced in R2017b