此示例显示了如何分析包含表情符号的文本数据。
表情符号是在文本中出现内联的绘画符号。当在智能手机和平板电脑等移动设备上撰写文本时,人们会使用表情符号来保持短文并传达情感和感受。
You also can use emojis to analyze text data. For example, use them to identify relevant strings of text or to visualize the sentiment or emotion of the text.
在使用文本数据时,表情符号可能会不可预测。根据您的系统字体,您的系统可能无法正确显示一些表情符号。因此,如果表情符号未正确显示,则数据不一定会丢失。您的系统可能无法在当前字体中显示表情符号。
In most cases, you can read emojis from a file (for example, by usingextractFileText
,extractHTMLText
, or可读取
) or by copying and pasting them directly into MATLAB®. Otherwise, you must compose the emoji using Unicode UTF16 code units.
Some emojis consist of multiple Unicode UTF16 code units. For example, the "smiling face with sunglasses" emoji ( with code point U+1F60E) is a single glyph but comprises two UTF16 code units"D83D"
和"DE0E"
。Create a string containing this emoji using the撰写
function, and specify the two code units with the prefix"\x"
。
emoji = compose(“ \ xd83d \ xde0e”)
表情符号=“”
First get the Unicode UTF16 code units of an emoji. Usechar
要获取表情符号的数字表示,然后使用DEC2HEX
获取相应的十六进制值。
codeUnits = dec2hex(char(emoji))
codeUnits =2×4 char array'D83D' 'DE0E'
使用strjoin
function with the empty delimiter""
。
FormatsPec = strjoin("\x"+ codeUnits,"")
formatSpec = "\xD83D\xDE0E"
emoji = compose(formatSpec)
表情符号=“”
提取文件中的文本数据weekendUpdates.xlsx
using可读取
。The fileweekendUpdates.xlsx
contains status updates containing the hashtags“#周末”
和"#vacation"
。
文件名="weekendUpdates.xlsx"; tbl = readtable(filename,'TextType','string');head(tbl)
ans =8×2 tableID TextData __ __________________________________________________________________________________ 1 "Happy anniversary! ❤ Next stop: Paris! ✈ #vacation" 2 "Haha, BBQ on the beach, engage smug mode! ❤ #vacation" 3 "getting ready for Saturday night #yum #weekend " 4 "Say it with me - I NEED A #VACATION!!! ☹" 5 " Chilling at home for the first time in ages…This is the life! #weekend" 6 "My last #weekend before the exam ." 7 "can’t believe my #vacation is over so unfair" 8 "Can’t wait for tennis this #weekend "
从现场提取文本数据TextData
和view the first few status updates.
textData = tbl.TextData; textData(1:5)
ans =5×1 string“周年快乐!❤下一站:巴黎!✈#vacation”“哈哈,海滩上的烧烤,参与自鸣得意的模式!##vacation”“为周六晚上做好准备#Vacation !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Visualize the text data in a word cloud.
图WordCloud(TextData);
使用使用contains
function. Find the indices of the documents containing the "smiling face with sunglasses" emoji ( with code U+1F60E). This emoji comprises the two Unicode UTF16 code units"D83D"
和 ”de0e”
。
emoji = compose(“ \ xd83d \ xde0e”);idx = contains(textdata,emoji);textdatasunglasses = textdata(idx);Textdatasunglasses(1:5)
ans =5×1 string"Haha, BBQ on the beach, engage smug mode! ❤ #vacation" "getting ready for Saturday night #yum #weekend " " Chilling at home for the first time in ages…This is the life! #weekend" " Check the out-of-office crew, we are officially ON #VACATION!! " "Who needs a #vacation when the weather is this good ☀ "
在单词云中可视化提取的文本数据。
图WordCloud(TextDatasunglasses);
Visualize all the emojis in text data using a word cloud.
Extract the emojis. First tokenize the text using象征性文档
,然后查看前几个文档。
documents = tokenizedDocument(textData); documents(1:5)
ans = 5×1 tokenizedDocument: 11 tokens: Happy anniversary ! ❤ Next stop : Paris ! ✈ #vacation 16 tokens: Haha , BBQ on the beach , engage smug mode ! ❤ #vacation 9 tokens: getting ready for Saturday night #yum #weekend 13 tokens: Say it with me - I NEED A #VACATION ! ! ! ☹ 19 tokens: Chilling at home for the first time in ages … This is the life ! #weekend
The象征性文档
功能自动检测表情符号并分配令牌类型"emoji"
。使用该文档的前几个令牌详细信息tokenDetails
function.
tdetails = tokendetails(文档);头(tdetails)
ans =8×5 tableToken DocumentNumber LineNumber Type Language _____________ ______________ __________ ___________ ________ "Happy" 1 1 letters en "anniversary" 1 1 letters en "!"1 1标点符号en“❤” 1 1 Emoji en“下一个”“ 1 1 Letters en”停止“ 1 1 Letters en”:“ 1 1标点en”“ Paris” 1 1 Letters en
Visualize the emojis in a word cloud by extracting the tokens with token type"emoji"
并进入输入它们wordcloud
function.
idx = tdetails.type =="emoji"; tokens = tdetails.Token(idx); figure wordcloud(tokens); title("Emojis")