List of abbreviations for sentence detection, specified as a string array, character vector, cell array of character vectors, or a table.
If the input documents do not contain sentence details, then the function first runs theaddSentenceDetails
function and specifies the abbreviation list given by'Abbreviations'
. To specify more options for sentence detection (for example, sentence starters) use theaddSentenceDetails
function before usingaddPartOfSpeechDetails
details.
IfAbbreviations
is a string array, character vector, or cell array of character vectors, then the function treats these as regular abbreviations. If the next word is a capitalized sentence starter, then the function breaks at the trailing period. The function ignores any differences in the letter case of the abbreviations. Specify the sentence starters using theStarters
name-value pair.
To specify different behaviors when splitting sentences at abbreviations, specifyAbbreviations
as a table. The table must have variables namedAbbreviation
andUsage
, whereAbbreviation
contains the abbreviations, andUsage
contains the type of each abbreviation. The following table describes the possible values ofUsage
, and the behavior of the function when passed abbreviations of these types.
Usage |
Behavior |
Example Abbreviation |
Example Text |
Detected Sentences |
regular |
If the next word is a capitalized sentence starter, then break at the trailing period. Otherwise, do not break at the trailing period. |
"appt." |
"Book an appt. We'll meet then." |
"Book an appt."
"We'll meet then."
|
"Book an appt. today." |
"Book an appt. today." |
inner |
Do not break after trailing period. |
"Dr." |
"Dr. Smith." |
"Dr. Smith." |
reference |
If the next token is not a number, then break at a trailing period. If the next token is a number, then do not break at the trailing period. |
"fig." |
"See fig. 3." |
"See fig. 3." |
"Try a fig. They are nice." |
"Try a fig."
"They are nice."
|
unit |
如果前面的词是一个数字和下面word is a capitalized sentence starter, then break at a trailing period. |
"in." |
"The height is 30 in. The width is 10 in." |
"The height is 30 in."
"The width is 10 in."
|
如果前面的词是一个数字和下面word is not capitalized, then do not break at a trailing period. |
"The item is 10 in. wide." |
"The item is 10 in. wide." |
If the previous word is not a number, then break at a trailing period. |
"Come in. Sit down." |
"Come in."
"Sit down."
|
The default value is the output of theabbreviations
函数。日本和韩国文字,abbreviations do not usually impact sentence detection.
Tip
By default, the function treats single letter abbreviations, such as "V.", or tokens with mixed single letters and periods, such as "U.S.A." as regular abbreviations. You do not need to include these abbreviations inAbbreviations
.
Data Types:char
|string
|table
|cell