Open access
Author
Date
2006Type
- Report
ETH Bibliography
yes
Altmetrics
Abstract
We prove that boosting with the squared error loss, L2Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the ℓ1-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the ℓ1-norm. We also propose here an AIC-based method for tuning, namely for choosing the number of boosting iterations. This makes L2Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate L2Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data. Show more
Permanent link
https://doi.org/10.3929/ethz-a-004680132Publication status
publishedJournal / series
Research Report / Seminar für Statistik, Eidgenössische Technische Hochschule (ETH)Volume
Publisher
Seminar für Statistik, ETHSubject
LINEAR STATISTICAL MODELS (MATHEMATICAL STATISTICS); LINEARE STATISTISCHE MODELLE (MATHEMATISCHE STATISTIK); weak greedy algorithm; Binary classification; Lasso; BOOSTING (MATHEMATISCHE STATISTIK); matching pursuit; BOOSTING (MATHEMATICAL STATISTICS); gene expression; overcomplete dictionary; variable selection; sparsityOrganisational unit
02537 - Seminar für Statistik (SfS) / Seminar for Statistics (SfS)03502 - Bühlmann, Peter L. / Bühlmann, Peter L.
Notes
Also published in: Annals of Statistics 34(2), 559-583.More
Show all metadata
ETH Bibliography
yes
Altmetrics