latent semantic indexing
Latent Semantic Indexing Understanding latent semantic indexing is quite complex and usually requires a degree in math in order to figure out and understand. There are a few methods that can be used in order to index and retrieve all the relevant pages of the users query. The obvious method of retrieving the relevant pages is by matching words from a search query to the same text found within the web pages that are available. The problem with simple word matching is that they are extremely inaccurate. This is because there are so many ways for a user to express the desired concept, which they are looking for. This is known as synonymy. This also happens because many words have multiple meanings. This is known as polysemy. With synonymy, the user's query may now actually match the text on the relevant pages. They will be overlooked and the problem of polysymy means the terms in a user's query will often match terms in irrelevant pages. Latent semantic indexing, or LSI is an attempt to overcome this problem. By looking at the patterns of words distributed across the entire web. Pages are considered that have many words in common and are thought to be close in semantically close in meaning. Pages that contain a few words in common are semantically distant. The result is a relatively accurate and similar value that has calculated for every content word or phrase. In response to a query, the LSI database will return pages it thinks to be correct and relevant to the query's search. he LSI algorithm doesn't understand anything about word meanings and does not require an exact match to return useful web pages.