An agent in pursuit of a task may work with a corpus containing text documents. One possible task of the agent is to retrieve documents of similar content and highlight relevant locations in retrieved documents. To perform information retrieval on the corpus, the agent may need additional data associated with the documents. Subjective Content Descriptions (SCDs) provide additional location-specific data for text documents. However, the agent needs SCDs referencing sentences of similar content across various documents in the corpus and most text documents are not associated with SCDs. Therefore, this paper presents UESM, an unsupervised approach to estimate SCDs for text documents, i.e., to associate any corpus with SCDs. In an evaluation, we show that the performance of UESM is on par with latent Dirichlet allocation, while UESM provides SCDs referencing sentences of similar content.
|Number of pages||8|
|Publication status||Published - 2023|