Main Content

docfun

应用函数在documents

Description

example

newDocuments= docfun(func,documents)calls the function specified by the function handlefuncand passes elements ofdocumentsas a string vector of words.

  • Iffuncaccepts exactly one input argument, then the words ofnewDocuments(i)are the output offunc(string(documents(i))).

  • Iffuncaccepts two input arguments, then the words ofnewDocuments(i)are the output offunc(string(documents(i)),details), wheredetailscontains the corresponding token details output bytokenDetails.

  • Iffuncchanges the number of words in the document, thendocfunremoves the token details from that document.

docfundoes not perform the calls to functionfuncin a specific order.

example

newDocuments= docfun(func,documents1,...,documentsN)calls the function specified by the function handlefuncand passes elements ofdocuments1,…,documentsNas string vectors of words, whereNis the number of inputs to the functionfunc. The words ofnewDocuments(i)are the output offunc(string(documents1(i)),...,string(documentsN(i))).

Each ofdocuments1,…,documentsNmust be the same size.

Examples

collapse all

Applyreverseto each word in a document array.

documents = tokenizedDocument([..."an example of a short sentence""a second short sentence"])
documents = 2x1 tokenizedDocument: 6 tokens: an example of a short sentence 4 tokens: a second short sentence
func = @reverse; newDocuments = docfun(func,documents)
newDocuments = 2x1 tokenizedDocument: 6 tokens: na elpmaxe fo a trohs ecnetnes 4 tokens: a dnoces trohs ecnetnes

Tag words by combining the words from one document array with another, using the string functionplus.

Create the firsttokenizedDocumentarray. Erase the punctuation and convert the text to lowercase.

str = [..."An example of a short sentence.""A second short sentence."]; str = erasePunctuation(str); str = lower(str); documents1 = tokenizedDocument(str)
documents1 = 2x1 tokenizedDocument: 6 tokens: an example of a short sentence 4 tokens: a second short sentence

Create the secondtokenizedDocumentarray. The documents have the same number of words as the corresponding documents indocuments1. The words ofdocuments2are POS tags for the corresponding words.

documents2 = tokenizedDocument([..."_det _noun _prep _det _adj _noun""_det _adj _adj _noun"])
documents2 = 2x1 tokenizedDocument: 6 tokens: _det _noun _prep _det _adj _noun 4 tokens: _det _adj _adj _noun
func = @plus; newDocuments = docfun(func,documents1,documents2)
newDocuments = 2x1 tokenizedDocument: 6 tokens: an_det example_noun of_prep a_det short_adj sentence_noun 4 tokens: a_det second_adj short_adj sentence_noun

The output is not the same as callingpluson the documents directly.

plus(documents1,documents2)
ans = 2x1 tokenizedDocument: 12 tokens: an example of a short sentence _det _noun _prep _det _adj _noun 8 tokens: a second short sentence _det _adj _adj _noun

Input Arguments

collapse all

Function handle that acceptsNstring arrays as inputs and outputs a string array.funcmust acceptstring(documents1(i)),...,string(documentsN(i))as input.

Function handle to apply to words in documents. The function must have one of the following syntaxes:

  • newWords = func(words), wherewordsis a string array of the words of a single document.

  • newWords = func(words,details), wherewordsis a string array of the words of a single document, anddetailsis the corresponding table of token details given bytokenDetails.

  • newWords = func(words1,...,wordsN), wherewords1,...,wordsNare string arrays of words.

Example:@reverse

Data Types:function_handle

Input documents, specified as atokenizedDocumentarray.

Output Arguments

collapse all

Output documents, returned as atokenizedDocumentarray.

Introduced in R2017b