Metadata-Version: 2.1
Name: sparsepca
Version: 0.2.3
Summary: Sparse Principal Component Analysis in Python
Home-page: UNKNOWN
Author: Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou
Author-email: lzxue@psu.edu
License: MIT
Project-URL: Documentation, https://xinging-birds.github.io/AManPG/
Project-URL: Source, https://github.com/xinging-birds/AManPG
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

![pypi version](https://img.shields.io/pypi/v/sparsepca.svg)![python version](https://img.shields.io/pypi/pyversions/sparsepca.svg) ![pypi downloads](https://img.shields.io/pypi/dm/sparsepca)

Uses an alternating manifold proximal gradient (A-ManPG) method to find sparse principal component loadings from the given data or covariance matrix. 

Requires numpy to be installed.

The GitHub repository can be found [here](https://github.com/xinging-birds/AManPG).

## Usage

```python
spca(z, lambda1, lambda2, 
     x0=None, y0=None, k=0, gamma=0.5, type=0, 
     maxiter=1e4, tol=1e-5, f_palm=1e5,
	 normalize=True, verbose=False):
```

## Arguments

| Name | Type | Description |
| --- | --- | --- |
| `z` | numpy.ndarray | Either the data matrix or sample covariance matrix |
| `lambda1` | float list | List of parameters of length n for L1-norm penalty |
| `lambda2` | float or numpy.inf | L2-norm penalty term |
| `x0` | numpy.ndarray | Initial x-values for the gradient method, default value is the first n right singular vectors |
| `y0` | numpy.ndarray | Initial y-values for the gradient method, default value is the first n right singular vectors |
| `k` | int | Number of principal components desired, default is 0 (returns min(n-1, p) principal components) |
| `gamma` | float | Parameter to control how quickly the step size changes in each iteration, default is 0.5 |
| `type` | int | If 0, b is expected to be a data matrix, and otherwise b is expected to be a covariance matrix; default is 0 |
| `maxiter` | int | Maximum number of iterations allowed in the gradient method, default is 1e4 |
| `tol` | float | Tolerance value required to indicate convergence (calculated as difference between iteration f-values), default is 1e-5 |
| `f_palm` | float | Upper bound for the F-value to reach convergence, default is 1e5 |
| `normalize` | bool | Center and normalize rows to Euclidean length 1 if True, default is True |
| `verbose` | bool | Function prints progress between iterations if True, default is False |

## Value

Returns a dictionary with the following key-value pairs:

| Key | Value Type | Value |
| --- | --- | --- |
| `loadings` | numpy.ndarray | Loadings of the sparse principal components |
| `f_manpg` | float | Final F-value |
| `x` | numpy.ndarray | Corresponding ndarray in subproblem to the loadings |
| `iter` | int | Total number of iterations executed |
| `sparsity` | float | Number of sparse loadings (loadings == 0) divided by number of all loadings |
| `time` | float | Execution time in seconds |

## Authors

Shixiang Chen, Justin Huang, Benjamin Jochem, Shiqian Ma, Lingzhou Xue, and Hui Zou

## References

Chen, S., Ma, S., Xue, L., and Zou, H. (2020) "An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Correlation Analysis" *INFORMS Journal on Optimization* 2:3, 192-208.

Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2), 265-286.

Zou, H., & Xue, L. (2018). A selective overview of sparse principal component analysis. Proceedings of the IEEE, 106(8), 1311-1320.

## Example

See `sparsepca.py` for a more in-depth example.

```python
import numpy as np
from sparsepca import spca_amanpg

k = 4  # columns
d = 500  # dimensions
m = 1000  # sample size
lambda1 = 0.1 * np.ones((k, 1))
lambda2 = 1

np.random.seed(10)
a = np.random.normal(0, 1, size=(m, d))  # generate random normal 1000x500 matrix
fin_sprout = spca_amanpg(a, lambda1, lambda2, k=k)
print(f"Finite: {fin_sprout['iter']} iterations with final value 
		{fin_sprout['f_manpg']}, sparsity {fin_sprout['sparsity']}, 
		timediff {fin_sprout['time']}.")

fin_sprout['loadings']

inf_sprout = spca_amanpg(a, lambda1, np.inf, k=4)
print(f"Infinite: {inf_sprout['iter']} iterations with final value 
		{inf_sprout['f_manpg']}, sparsity {inf_sprout['sparsity']}, 
		timediff {inf_sprout['time']}.")

inf_sprout['loadings']
```


# History

## 0.2.3

- Doc fixes

## 0.2.2

- Doc fixes
- PyPI metadata fixes

## 0.2.1

- Doc fixes

## 0.2.0

- Initial PyPI release


