EPFL
 Biomedical Imaging GroupSTI
EPFL
  Publications
English only   BIG > Publications > Hadoop Optimization


 CONTENTS
 Home Page
 News & Events
 People
 Publications
 Tutorials and Reviews
 Research
 Demos
 Download Algorithms

 DOWNLOAD
 PDF
 Postscript
 All BibTeX References

Optimized Distributed Hyperparameter Search and Simulation for Lung Texture Classification in CT Using Hadoop

R. Schaer, H. Müller, A. Depeursinge

Journal of Imaging, vol. 2, no. 2, pp. 1-20, June 2016.



Many medical image analysis tasks require complex learning strategies to reach a quality of image-based decision support that is sufficient in clinical practice. The analysis of medical texture in tomographic images, for example of lung tissue, is no exception. Via a learning framework, very good classification accuracy can be obtained, but several parameters need to be optimized. This article describes a practical framework for efficient distributed parameter optimization. The proposed solutions are applicable for many research groups with heterogeneous computing infrastructures and for various machine learning algorithms. These infrastructures can easily be connected via distributed computation frameworks. We use the Hadoop framework to run and distribute both grid and random search strategies for hyperparameter optimization and cross-validations on a cluster of 21 nodes composed of desktop computers and servers. We show that significant speedups of up to 364× compared to a serial execution can be achieved using our in-house Hadoop cluster by distributing the computation and automatically pruning the search space while still identifying the best-performing parameter combinations. To the best of our knowledge, this is the first article presenting practical results in detail for complex data analysis tasks on such a heterogeneous infrastructure together with a linked simulation framework that allows for computing resource planning. The results are directly applicable in many scenarios and allow implementing an efficient and effective strategy for medical (image) data analysis and related learning approaches.


@ARTICLE(http://bigwww.epfl.ch/publications/schaer1601.html,
AUTHOR="Schaer, R. and M{\"{u}}ller, H. and Depeursinge, A.",
TITLE="Optimized Distributed Hyperparameter Search and Simulation for
        Lung Texture Classification in {CT} Using {H}adoop",
JOURNAL="Journal of Imaging",
YEAR="2016",
volume="2",
number="2",
pages="1--20",
month="June",
note="")

© 2016 MDPI. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from MDPI.
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.