Abstract
In our society today, we are faced with an immeasurable amount of digital information. The analysis of this data provides insights, both on an economic and on a scientific level. The large influx of data helps to identify trends or to deliver new scientific results. The resulting datasets, however, are potentially too large, such that the required statistical analysis becomes impossible to be solved numerically in a timely fashion.
In this thesis we focus on regression analysis, which constitutes a standard method within statistics and machine learning. In a big data context, both the number of observations and the number of variables in these models may increase without limit. A numerically challenging problem within the theory of regression analysis is the solution of the least squares estimator.
During this project, we focussed on the least squares multisplitting (LSMS) method, which allows a large regression problem to be replaced by a set of smaller regression problems, where the splitting happens within the variables. These smaller regression problems are solvable in parallel and communicate with each other to update their local solutions in an iterative fashion. An important shortcoming for an enduser is the lack of clarity in how this variable splitting should be performed. In order to address this partition problem, we applied clustering and graphtheory to provide a solution. We further developed alternative hierarchical methods to iteratively solve the splittingproblem and studied the convergence behaviour.
An important application domain for our methods is signal processing. More precisely, we considered a least squares problem that arises in the estimation of the frequency response function under missing samples. This type of problem has a beautiful statistical solution based on local polynomial approximation. We show that an application of the LSMS provides improvement, but we further developed specific solvers using 2cyclic matrix splittings, which optimally employ the structure of this regression problem. The resulting method improves the arithmetic complexity from cubic to (iterative) loglinear, such that the numerical scalability is no longer an issue.
In this thesis we focus on regression analysis, which constitutes a standard method within statistics and machine learning. In a big data context, both the number of observations and the number of variables in these models may increase without limit. A numerically challenging problem within the theory of regression analysis is the solution of the least squares estimator.
During this project, we focussed on the least squares multisplitting (LSMS) method, which allows a large regression problem to be replaced by a set of smaller regression problems, where the splitting happens within the variables. These smaller regression problems are solvable in parallel and communicate with each other to update their local solutions in an iterative fashion. An important shortcoming for an enduser is the lack of clarity in how this variable splitting should be performed. In order to address this partition problem, we applied clustering and graphtheory to provide a solution. We further developed alternative hierarchical methods to iteratively solve the splittingproblem and studied the convergence behaviour.
An important application domain for our methods is signal processing. More precisely, we considered a least squares problem that arises in the estimation of the frequency response function under missing samples. This type of problem has a beautiful statistical solution based on local polynomial approximation. We show that an application of the LSMS provides improvement, but we further developed specific solvers using 2cyclic matrix splittings, which optimally employ the structure of this regression problem. The resulting method improves the arithmetic complexity from cubic to (iterative) loglinear, such that the numerical scalability is no longer an issue.
Original language  English 

Awarding Institution 

Supervisors/Advisors 

Award date  27 Jun 2022 
Place of Publication  Brussel 
Publisher  
Print ISBNs  9789464443325 
Publication status  Published  2022 