Topic Modeling – Latent Semantic Analysis (LSA) and Singular Value Decomposition (SVD):

Singular Value Decomposition is a Linear Algebraic concept used in may areas such as machine learning (principal component analysis, Latent Semantic Analysis, Recommender Systems and word embedding), data mining and bioinformatics
The technique decomposes given matrix into there matrices, let’s look at programmer’s perspective of the technique.

[codesyntax lang=”python”]

import numpy as np
import pandas as pd

[/codesyntax]

Let’s take a small dataset and create Document-Term matrix.

Note: Need to build Term-Document matrix For Query Retrieval in IR task.

[codesyntax lang=”python”]

documents = ["I enjoy Machine Learning", "I like NLP", "I like deep learning"]

from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer(ngram_range=(1, 1))
X = cv.fit_transform(documents).toarray()

[/codesyntax]

X is the Document-Term Matrix, “DOCUMENTS” – are the “rows” and “TOKENS” – are the “columns” of the matrix

Output: array([[0, 1, 1, 0, 1, 0],
       [0, 0, 0, 1, 0, 1],
       [1, 0, 1, 1, 0, 0]], dtype=int64)

[codesyntax lang=”python”]

cv.get_feature_names()

[/codesyntax]

Output: ['deep', 'enjoy', 'learning', 'like', 'machine', 'nlp']

[codesyntax lang=”python”]

cv.vocabulary_

[/codesyntax]

Output: {'enjoy': 1, 'machine': 4, 'learning': 2, 'like': 3, 'nlp': 5, 'deep': 0}

Notation

“*” – for Matrix multiplication
V.T – to denote Transpose of matrix V

Now SVD will decompose rectangular matrix X into there matrices. Namely U, S and V.T (V transpose)

U – the left Eigen vector (orthogonal) matrix, the vectors extract “Document-to-Document similarity”.
V – the right Eigen vector (orthogonal) matrix, the vectors extract “Strength of Two Different Words Occurring in one Document” (may or may-not be co-occurrence)
S – the Eigen Values (diagonal) matrix, explains variance ratio.

Before we understand how U, V, S extract relation strengh or explain variance ratio. Let’s understand on how to decompose X.
To decomposing any matrix, we should use Eigen Vectors, Eigen Values decomposition.
A matrix is eligible to apply Eigen decomposition only when it is a *square and symmetric matrix.
We can get a SQUARE, SYMMETRIC matrix by multiplying “X.T with X” or “X with X.T”, we can decompose these (np.dot(X.T,X) and np.dot(X,X.T) and get U, S, V matrixes.

Why to multiply “X.T with X” or “X with X.T”?, Why to decompose “X.T with X” or “X with X.T”?
Suppose that X is decomposed as U S V.T
- i.e., X = U * S * V.T
Now let’s see what is X.T multiplied by X
- X.T * X = (U * S * V.T).T * (U * S * V.T)
- = (V * S.T * U.T) * (U * S * V)
- = V * S.T * (U.T * U) * S * V)
- = V * S.T * (I) * S * V # as U is an orthogonal matrix U.T * U = I
- = V * (S.T * S) * V
- = V * S^2 * V # as S is diagnoal matrix S.T * S is S-square
We got V * S^2 * V, when we multiply X.T with X, where X is in decomposed form (U * S * V.T), Hence, when we decompose X.T * X, we get values of matrix V and S^2
Now let’s see what is X multiplied by X.T
- X * X.T = (U * S * V.T) * (U * S * V.T).T
- = (U * S * V) * (V.T * S.T * U)
- = (U * S * (V * V.T) * S.T * U)
- = U * S * (I) * S.T * V # as U is an orthogonal matrix U.T * U = I
- = U * (S.T * S) * U
- = U * S^2 * U # as S is diagnoal matrix S * S.T is S-square
We got U * S^2 * U, when we multiply X.T with X, where X is in decomposed form (U * S * V.T), Hence, when we decompose X * X.T, we get values of matrix U and S^2
Bottom line: To find U, S and V, we need to find Eigen Decoposition of X.T * X and X * X.T

X.T * X is word similarity matrix, the (i, j) position in X.T * X will quantify the overlap of Word-i and Word-j. Number of Documents in which both words i and j occur. (This inference is based out of Document-Term matrix, if it is Tf-IDF, then the inference would be different)

[codesyntax lang=”python”]

np.dot(X.T, X)

[/codesyntax]

Output: array([[1, 0, 1, 1, 0, 0],
       [0, 1, 1, 0, 1, 0],
       [1, 1, 2, 1, 1, 0],
       [1, 0, 1, 2, 0, 1],
       [0, 1, 1, 0, 1, 0],
       [0, 0, 0, 1, 0, 1]], dtype=int64)

X * X.T is document similarity matrix, the (i, j) position in X * X.T will quantify the similarity of Document-i and Document-j. This analogy suits best with Tf-IDF scores.

[codesyntax lang=”python”]

np.dot(X, X.T)

[/codesyntax]

Output: array([[3, 0, 1],
       [0, 2, 1],
       [1, 1, 3]], dtype=int64)

The rows in sigular vector matrix V.T are the topics and values in the row denote strenght of words in the topic.

[codesyntax lang=”python”]

U, s, Vh = np.linalg.svd(X)

terms = cv.vocabulary_
for i, component in enumerate(Vh):
    terms_components = zip(terms, component)
    sorted_terms = sorted(terms_components, key=lambda x:x[1], reverse=True)[:3] # take features for topic
    print("topic : ", i)
    for term_socres in sorted_terms:
        print(10*" ", term_socres[0])
    print(50*'*')

[/codesyntax]

Output:
topic :  0
           deep
           nlp
           machine
**************************************************
topic :  1
           nlp
           machine
           learning
**************************************************
topic :  2
           enjoy
           learning
           like
**************************************************
topic :  3
           nlp
           enjoy
           like
**************************************************
topic :  4
           nlp
           learning
           like
**************************************************
topic :  5
           deep
           enjoy
           nlp
**************************************************

The above result may not make sense as the dataset is too small. If we take decent amount data, the results will be promising

Cons:

SVD is a linear decomposition, hence cannot extract non-linear patterns
SVD computation is very expensive, cannot handle huge volumens of data (Randomized SVD is one of the solutions)

–By

L Venkata Rama Raju

2 thoughts on “Latent Semantic Analysis (LSA) and Singular Value Decomposition (SVD)”

Free Private Proxies says:

September 14, 2019 at 9:02 pm

I am impressed with this site, rattling I am a big fan .

Ariel Bumbaca says:

December 14, 2019 at 12:11 am

Sweet site, super design and style, rattling clean and utilize genial.

Let’s take a small dataset and create Document-Term matrix.

X is the Document-Term Matrix, “DOCUMENTS” – are the “rows” and “TOKENS” – are the “columns” of the matrix

Notation

Now SVD will decompose rectangular matrix X into there matrices. Namely U, S and V.T (V transpose)

Before we understand how U, V, S extract relation strengh or explain variance ratio. Let’s understand on how to decompose X.

Why to multiply “X.T with X” or “X with X.T”?, Why to decompose “X.T with X” or “X with X.T”?

X.T * X is word similarity matrix, the (i, j) position in X.T * X will quantify the overlap of Word-i and Word-j. Number of Documents in which both words i and j occur. (This inference is based out of Document-Term matrix, if it is Tf-IDF, then the inference would be different)

X * X.T is document similarity matrix, the (i, j) position in X * X.T will quantify the similarity of Document-i and Document-j. This analogy suits best with Tf-IDF scores.

The rows in sigular vector matrix V.T are the topics and values in the row denote strenght of words in the topic.

The above result may not make sense as the dataset is too small. If we take decent amount data, the results will be promising

Cons:

2 thoughts on “Latent Semantic Analysis (LSA) and Singular Value Decomposition (SVD)”

Leave a Reply Cancel reply