The statistics and mathematics of high dimension low sample size asymptotics

Dan Shen, Haipeng Shen, Hongtu Zhu, J. S. Marron

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

The aim of this paper is to establish several theoretical properties of principal component analysis for multiple-component spike covariance models. Our results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are explored, and additional theoretical results are presented.

Original languageEnglish (US)
Pages (from-to)1747-1770
Number of pages24
JournalStatistica Sinica
Volume26
Issue number4
DOIs
StatePublished - Oct 2016

Keywords

  • Big data
  • Conical behavior
  • High dimension low sample size
  • PCA

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'The statistics and mathematics of high dimension low sample size asymptotics'. Together they form a unique fingerprint.

Cite this