AdminLTELogo

Text Preview

An Overview Of Latent Semantic Indexing
Latent Semantic Indexing Words: 262 .txt 3.63 KB Modified: 2008-03-29 13:26:46 UTC
Relative Path
Latent Semantic Indexing/an overview of latent semantic indexing.txt
Content Type
text/plain
Created (UTC)
2025-10-20 10:38:53
Last Access (UTC)
2025-10-20 10:38:53
Attributes
Normal
SHA-256 (first 10MB)
dbf91fa5d6f1c3d4294def92575ed6c51fb7d7775704a0de9a59b3c462223a2a
An Overview of Latent Semantic Indexing

Latent semantic indexing is a technique that projects
queries and documents into space with latent semantic
dimensions.

In the latent semantic space, a query and a document
are similar even if they don't share any of the same
terms if their terms are semantically similar.

LSI is similarly metric to word overlap measures. LSI
has fewer dimensions than the original space and is a
method for dimensionality reduction.

This reduction takes a set of objects that exist in a
high-dimensional space and rearranges them and
represents them in a lower dimensional space instead.

They are often represented in two or three-dimensional
space just for the purpose of visualization. Latent
Semantic Indexing, or LSI is a mathematical
application technique sometimes known as singular
value decomposition.

The projection into the LSI space is chosen so that
the representations in the space of origin are changed
as little as possible. Then it is measured by the sum
of the squares of the difference.

There are several different mappings for latent
semantic indexing from high dimensional to low
dimensional spaces.

LSI chooses the optimal mapping in a sense that
minimizes the distance. Choosing the number of
dimensions is a unique problem.

A reduction can remove much of the noise while keeping
too few dimensions may lose important information. LSI
performance is improved considerably after ten to
twenty dimensions and peaks at seventy to one hundred
dimensions.

Then it slowly begins to diminish again. There is a
pattern of performance that is observed with other
datasets as well.