Main Content

addTypeDetails

Add token type details to documents

Description

example

updatedDocuments= addTypeDetails(documents)detects the token types indocumentsand updates the token details. The function adds type details to the tokens with unknown type only. To get the token types fromupdatedDocuments, usetokenDetails.

example

updatedDocuments= addTypeDetails(documents,Name,Value)specifies additional options using one or more name-value pairs.

Tip

UseaddTypeDetailsbefore using thelower,upper, anderasePunctuationfunctions asaddTypeDetailsuses information that is removed by these functions.

Examples

collapse all

Convert manually tokenized text into atokenizedDocumentobject, setting the'TokenizeMethod'option to'none'.

str = ["For""more""information"",""see""//www.tatmou.com/jp/""."]; documents = tokenizedDocument(str,'TokenizeMethod','none')
documents = tokenizedDocument: 7 tokens: For more information , see //www.tatmou.com .

View the token details using thetokenDetailsfunction.

tdetails = tokenDetails(documents)
tdetails=7×2 tableToken DocumentNumber ___________________________ ______________ "For" 1 "more" 1 "information" 1 "," 1 "see" 1 "//www.tatmou.com/jp/" 1 "." 1

If you set'TokenizeMethod'to'none'in the call to thetokenizedDocumentfunction, then it does not detect the types of the tokens. To add the token type details, use theaddTypeDetailsfunction.

documents = addTypeDetails(documents);

详细视图更新的令牌s.

tdetails = tokenDetails(documents)
tdetails=7×3 tableToken DocumentNumber Type ___________________________ ______________ ___________ "For" 1 letters "more" 1 letters "information" 1 letters "," 1 punctuation "see" 1 letters "//www.tatmou.com/jp/" 1 web-address "." 1 punctuation

Input Arguments

collapse all

Input documents, specified as atokenizedDocumentarray.

Name-Value Arguments

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, whereNameis the argument name andValueis the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and encloseNamein quotes.

Example:'TopLevelDomains',["com" "net" "org"]specifies the top-level domains "com", "net", and "org" for web address detection.

Top-level domains to use for web address detection, specified as a character vector, string array, or cell array of character vectors.

If you do not specifyTopLevelDomains,那么这个函数使用的输出topLevelDomainsfunction.

Example:["com" "net" "org"]

Data Types:char|string|cell

Option to discard previously computed details and recompute them, specified astrueorfalse.

Data Types:logical

Output Arguments

collapse all

Updated documents, returned as atokenizedDocumentarray. To get the token details fromupdatedDocuments, usetokenDetails.

Version History

Introduced in R2018b