We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

On Data Reduction of Big Data

Formal Metadata

Title
On Data Reduction of Big Data
Title of Series
Number of Parts
21
Author
License
CC Attribution - NonCommercial - NoDerivatives 4.0 International:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in Big Data analysis is data reduction. In this presentation, I will review some existing approaches in data reduction and introduce a new strategy called information-based optimal subdata selection (IBOSS). Under linear and nonlinear models set up, theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to other approaches in term of parameter estimation and predictive performance. The tradeoff between accuracy and computation cost is also investigated. When models are mis-specified, the performance of different data reduction methods are compared through simulation studies. Some ongoing research work as well as some open questions will also be discussed.