Full Program »

Dimensionality and Data Size Reduction Using Singular Value Decomposition

In this paper, we discuss how to use SVD (Singular Value Decomposition) to reduce the data size as a preprocessing method before applying machine learning algorithms. Data reduction can lead to more efficient, and possibly better-performing machine learning models, especially when datasets are large, noisy, or high-dimensional. Specifically, we demonstrate two methods: PCA (Primary Component Analysis) and the data compression technique using SVD. For each method, we explain how it works and show the execution time, memory usage, and data reduction ratio using the random forest classification algorithm. All demonstrations of these methods are implemented in Python and the Python code is provided.

Ben Kim
Seattle University
United States