Home » , , , » Download PDF Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB by Wendy L. Martinez

Download PDF Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB by Wendy L. Martinez



Sinopsis

This book is divided into two main sections: pattern discovery and graphical EDA. We first cover linear and nonlinear dimensionality reduction because sometimes structure is discovered or can only be discovered with fewer dimensions or features. We include some classical techniques such as principal component analysis, factor analysis, and multidimensional scaling, as well as some of the more recent computationally intensive methods like self-organizing maps, locally linear embedding, isometric feature mapping, and generative topographic maps.
 
Searching the data for insights and information is fundamental to EDA. So, we describe several methods that ‘tour’ the data looking for interesting structure (holes, outliers, clusters, etc.). These are variants of the grand tour and projection pursuit that try to look at the data set in many 2-D or 3-D views in the hope of discovering something interesting and informative. Clustering or unsupervised learning is a standard tool in EDA and data mining. These methods look for groups or clusters, and some of the issues that must be addressed involve determining the number of clusters and the validity or strength of the clusters. Here we cover some of the classical methods such as hierarchical clustering and k-means. We also devote an entire chapter to a newer technique called model-based clustering that includes a way to determine the number of clusters and to assess the resulting clusters.
 
Evaluating the relationship between variables is an important subject in data analysis. We do not cover the standard regression methodology; it is assumed that the reader already understands that subject. Instead, we include a chapter on scatterplot smoothing techniques such as loess. The second section of the book discusses many of the standard techniques of visualization for EDA. The reader will note, however, that graphical techniques, by necessity, are used throughout the book to illustrate ideas and concepts.
 
In this section, we provide some classic, as well as some novel ways of visualizing the results of the cluster process, such as dendrograms, treemaps,rectangle plots, and ReClus. These visualization techniques can be used to assess the output from the various clustering algorithms that were covered in the first section of the book. Distribution shapes can tell us important things about the underlying phenomena that produced the data. We will look at ways to determine the shape of the distribution by using boxplots, bagplots, q-q plots, histograms, and others.


Content

  1. Introduction to Exploratory Data Analysis
  2. Dimensionality Reduction - Linear Methods
  3. Dimensionality Reduction - Nonlinear Methods
  4. Data Tours
  5. Finding Clusters
  6. Model-Based Clustering
  7. Smoothing Scatterplots
  8. Visualizing Clusters
  9. Distribution Shapes
  10. Multivariate Visualization
  11. Proximity Measures
  12. Software Resources for EDA
  13. Introduction to MATLAB


1 komentar:

  1. Take advantage of the advanced career opportunities in the field of Python programming by enrolling in AI Patasala Python Training in Hyderabad.
    Online Python Training in Hyderabad

    BalasHapus