Eric Bunch's Blog

Cython Examples: Random Sampling and Latent Dirichlet Allocation

Python is great, but sometimes is too slow for my needs. In this post, we will walk through how to get up and running with Cython, and go through some examples including how to perform fast random sampling--even faster than numpy in some cases!--and will show how to implement the collapsed Gibbs sampler for Latent Dirichlet Allocation.

Read More →

Spectral Clustering

In this post we will investigate spectral clustering, which uses the eigenvalue decomposition of a data set's Laplacian matrix. We will look into the eigengap heuristic, which give guidelines on how many clusters to choose, as well as an example using breast cancer proteome data.

Read More →

Calculating Homology of a Simplicial Complex Using Smith Normal Form

In this post we will define homology and see how to compute it for a simplicial complex using Smith Normal Form.

Read More →

Topological Data Analysis and Persistent Homology

In this post we explore persistent homology and how it is constructed.

Read More →

The Simpsons' Best Episode Ever by the Data

Using Kaggle's Simpsons data set, we determine which episode is the definitive Best Episode Ever!

Read More →

Forecasting new home sales in the U.S.

We use R to forecast the time series of monthly sales of new one-family houses sold in the USA from 1973 to 1996.

Read More →

Luigi vs. Airflow

Comparison of data pipelining libraries Spotify's Luigi and Airbnb's Airflow.

Read More →

Breakout Detection by Twitter

Describing Twitter's breakout detection package.

Read More →

Transfer files and directories to an EC2 instance

I'll show how to tansfer a file or directory from your computer to an existing EC2 instance.

Read More →

Setting up an Amazon Ubuntu EC2 instance and configuring with Python2.7.9 and SciPy stack

I go through how to how to set up an Amazon EC2 instance and setting up an environment for scientific computation.

Read More →