Main Content

mmrScores

Document scoring with Maximal Marginal Relevance (MMR) algorithm

Description

example

scores= mmrScores(documents,queries)scoresdocumentsaccording to their relevance to aqueries避免冗余使用MMR algorithm. The score inscores(i,j)MMR得分吗documents(i)relative toqueries(j).

scores= mmrScores(bag,queries)scores documents encoded by the bag-of-words or bag-of-n-grams modelbagrelative toqueries. The score inscores(i,j)MMR得分吗theith document inbagrelative toqueries(j).

scores= mmrScores(___,lambda)also specifies the trade off between relevance and redundancy.

Examples

collapse all

Create an array of input documents.

str = ["the quick brown fox jumped over the lazy dog""the fast fox jumped over the lazy dog""the dog sat there and did nothing""the other animals sat there watching"]; documents = tokenizedDocument(str)
documents = 4x1 tokenizedDocument: 9 tokens: the quick brown fox jumped over the lazy dog 8 tokens: the fast fox jumped over the lazy dog 7 tokens: the dog sat there and did nothing 6 tokens: the other animals sat there watching

Create an array of query documents.

str = ["a brown fox leaped over the lazy dog""another fox leaped over the dog"]; queries = tokenizedDocument(str)
queries = 2x1 tokenizedDocument: 8 tokens: a brown fox leaped over the lazy dog 6 tokens: another fox leaped over the dog

Calculate MMR scores using themmrScoresfunction. The output is a sparse matrix.

scores = mmrScores(documents,queries);

Visualize the MMR scores in a heat map.

figure heatmap(scores); xlabel("Query Document") ylabel("Input Document") title("MMR Scores")

Figure contains an object of type heatmap. The chart of type heatmap has title MMR Scores.

Higher scores correspond to stonger relavence to the query documents.

Create an array of input documents.

str = ["the quick brown fox jumped over the lazy dog""the quick brown fox jumped over the lazy dog""the fast fox jumped over the lazy dog""the dog sat there and did nothing""the other animals sat there watching""the other animals sat there watching"]; documents = tokenizedDocument(str);

Create a bag-of-words model from the input documents.

bag = bagOfWords(documents)
bag = bagOfWords with properties: Counts: [6x17 double] Vocabulary: ["the" "quick" "brown" "fox" ... ] NumWords: 17 NumDocuments: 6

Create an array of query documents.

str = ["a brown fox leaped over the lazy dog""another fox leaped over the dog"]; queries = tokenizedDocument(str)
queries = 2x1 tokenizedDocument: 8 tokens: a brown fox leaped over the lazy dog 6 tokens: another fox leaped over the dog

Calculate the MMR scores. The output is a sparse matrix.

scores = mmrScores(bag,queries);

Visualize the MMR scores in a heat map.

figure heatmap(scores); xlabel("Query Document") ylabel("Input Document") title("MMR Scores")

Figure contains an object of type heatmap. The chart of type heatmap has title MMR Scores.

Now calculate the scores again, and set the lambda value to 0.01. When the lambda value is close to 0, redundant documents yield lower scores and diverse (but less query-relevant) documents yield higher scores.

lambda = 0.01; scores = mmrScores(bag,queries,lambda);

Visualize the MMR scores in a heat map.

figure heatmap(scores); xlabel("Query Document") ylabel("Input Document") title("MMR Scores, lambda = "+ lambda)

Figure contains an object of type heatmap. The chart of type heatmap has title MMR Scores, lambda = 0.01.

Finally, calculate the scores again and set the lambda value to 1. When the lambda value is 1, the query-relevant documents yield higher scores despite other documents yielding high scores.

lambda = 1; scores = mmrScores(bag,queries,lambda);

Visualize the MMR scores in a heat map.

figure heatmap(scores); xlabel("Query Document") ylabel("Input Document") title("MMR Scores, lambda = "+ lambda)

Figure contains an object of type heatmap. The chart of type heatmap has title MMR Scores, lambda = 1.

Input Arguments

collapse all

Input documents, specified as atokenizedDocumentarray, a string array of words, or a cell array of character vectors. Ifdocumentsis not atokenizedDocumentarray, then it must be a row vector representing a single document, where each element is a word. To specify multiple documents, use atokenizedDocumentarray.

Input bag-of-words or bag-of-n-grams model, specified as abagOfWordsobject or abagOfNgramsobject. Ifbagis abagOfNgramsobject, then the function treats each n-gram as a single word.

Set of query documents, specified as one of the following:

  • AtokenizedDocumentarray

  • A 1-by-Nstring array representing a single document, where each element is a word

  • A 1-by-Ncell array of character vectors representing a single document, where each element is a word

To compute term frequency and inverse document frequency statistics, the function encodesqueriesusing a bag-of-words model. The model it uses depends on the syntax you call it with. If your syntax specifies the input argumentdocuments, then it usesbagOfWords(documents). If your syntax specifiesbag, then the function encodesqueriesusingbagthen uses the resulting tf-idf matrix.

Trade off between relevance and redundancy, specified as a nonnegative scalar.

Whenlambdais close to 0, redundant documents yield lower scores and diverse (but less query-relevant) documents yield higher scores. Iflambdais 1, then query-relevant documents yield higher scores despite other documents yielding high scores.

Data Types:single|double|int8|int16|int32|int64|uint8|uint16|uint32|uint64

Output Arguments

collapse all

MMR scores, returned as anN1-by-N2matrix, wherescores(i,j)MMR得分吗documents(i)relative tojth query document, andN1andN2are the number of input and query documents, respectively.

A document has a high MMR score if it is both relevant to the query and has minimal similarity relative to the other documents.

References

[1] Carbonell, Jaime G., and Jade Goldstein. "The use of MMR, diversity-based reranking for reordering documents and producing summaries." InSIGIR, vol. 98, pp. 335-336. 1998.

Version History

Introduced in R2020a