封面
版权信息
Credits
Preface
Part 1. Module 1
Chapter 1. Introducing Data Analysis and Libraries
Data analysis and processing
An overview of the libraries in data analysis
Python libraries in data analysis
Summary
Chapter 2. NumPy Arrays and Vectorized Computation
NumPy arrays
Array functions
Data processing using arrays
Linear algebra with NumPy
NumPy random numbers
Summary
Chapter 3. Data Analysis with Pandas
An overview of the Pandas package
The Pandas data structure
The essential basic functionality
Indexing and selecting data
Computational tools
Working with missing data
Advanced uses of Pandas for data analysis
Summary
Chapter 4. Data Visualization
The matplotlib API primer
Exploring plot types
Legends and annotations
Plotting functions with Pandas
Additional Python data visualization tools
Summary
Chapter 5. Time Series
Time series primer
Working with date and time objects
Resampling time series
Downsampling time series data
Upsampling time series data
Time zone handling
Timedeltas
Time series plotting
Summary
Chapter 6. Interacting with Databases
Interacting with data in text format
Interacting with data in binary format
Interacting with data in MongoDB
Interacting with data in Redis
Summary
Chapter 7. Data Analysis Application Examples
Data munging
Data aggregation
Grouping data
Summary
Chapter 8. Machine Learning Models with scikit-learn
An overview of machine learning models
The scikit-learn modules for different models
Data representation in scikit-learn
Supervised learning – classification and regression
Unsupervised learning – clustering and dimensionality reduction
Measuring prediction performance
Summary
Part 2. Module 2
Chapter 1. Getting Started with Predictive Modelling
Introducing predictive modelling
Applications and examples of predictive modelling
Python and its packages – download and installation
Python and its packages for predictive modelling
IDEs for Python
Summary
Chapter 2. Data Cleaning
Reading the data – variations and examples
Various methods of importing data in Python
The read_csv method
Use cases of the read_csv method
Case 2 – reading a dataset using the open method of Python
Case 3 – reading data from a URL
Case 4 – miscellaneous cases
Basics – summary dimensions and structure
Handling missing values
Creating dummy variables
Visualizing a dataset by basic plotting
Summary
Chapter 3. Data Wrangling
Subsetting a dataset
Generating random numbers and their usage
Grouping the data – aggregation filtering and transformation
Random sampling – splitting a dataset in training and testing datasets
Concatenating and appending data
Merging/joining datasets
Summary
Chapter 4. Statistical Concepts for Predictive Modelling
Random sampling and the central limit theorem
Hypothesis testing
Chi-square tests
Correlation
Summary
Chapter 5. Linear Regression with Python
Understanding the maths behind linear regression
Making sense of result parameters
Implementing linear regression with Python
Model validation
Handling other issues in linear regression
Summary
Chapter 6. Logistic Regression with Python
Linear regression versus logistic regression
Understanding the math behind logistic regression
Implementing logistic regression with Python
Model validation and evaluation
Model validation
Summary
Chapter 7. Clustering with Python
Introduction to clustering – what why and how?
Mathematics behind clustering
Implementing clustering using Python
Fine-tuning the clustering
Summary
Chapter 8. Trees and Random Forests with Python
Introducing decision trees
Understanding the mathematics behind decision trees
Implementing a decision tree with scikit-learn
Understanding and implementing regression trees
Understanding and implementing random forests
Summary
Chapter 9. Best Practices for Predictive Modelling
Best practices for coding
Best practices for data handling
Best practices for algorithms
Best practices for statistics
Best practices for business contexts
Summary
Appendix A. A List of Links
Part 3. Module 3
Chapter 1. A Conceptual Framework for Data Visualization
Data information knowledge and insight
The transformation of data
Data visualization history
How does visualization help decision-making?
Visualization plots
Summary
Chapter 2. Data Analysis and Visualization
Why does visualization require planning?
The Ebola example
A sports example
Creating interesting stories with data
Perception and presentation methods
Some best practices for visualization
Visualization tools in Python
Interactive visualization
Summary
Chapter 3. Getting Started with the Python IDE
The IDE tools in Python
Visualization plots with Anaconda
Interactive visualization packages
Summary
Chapter 4. Numerical Computing and Interactive Plotting
NumPy SciPy and MKL functions
Scalar selection
Slicing
Array indexing
Other data structures
Visualization using matplotlib
The visualization example in sports
Summary
Chapter 5. Financial and Statistical Models
The deterministic model
The stochastic model
The threshold model
An overview of statistical and machine learning
Creating animated and interactive plots
Summary
Chapter 6. Statistical and Machine Learning
Classification methods
Understanding linear regression
Linear regression
Decision tree
The Bayes theorem
The Naïve Bayes classifier
The Naïve Bayes classifier using TextBlob
Viewing positive sentiments using word clouds
k-nearest neighbors
Logistic regression
Support vector machines
Principal component analysis
k-means clustering
Summary
Chapter 7. Bioinformatics Genetics and Network Models
Directed graphs and multigraphs
The clustering coefficient of graphs
Analysis of social networks
The planar graph test
The directed acyclic graph test
Maximum flow and minimum cut
A genetic programming example
Stochastic block models
Summary
Chapter 8. Advanced Visualization
Computer simulation
Summary
Appendix B. Go Forth and Explore Visualization
An overview of conda
Packages installed with Anaconda
Packages websites
About matplotlib
Bibliography
Index
更新时间:2021-07-09 18:52:29