Python is great, but sometimes is too slow for my needs. In this post, we will walk through how to get up and running with Cython, and go through some examples including how to perform fast random sampling--even faster than numpy in some cases!--and will show how to implement the collapsed Gibbs sampler for Latent Dirichlet Allocation.
Read More →In this post we will investigate spectral clustering, which uses the eigenvalue decomposition of a data set's Laplacian matrix. We will look into the eigengap heuristic, which give guidelines on how many clusters to choose, as well as an example using breast cancer proteome data.
Read More →In this post we will define homology and see how to compute it for a simplicial complex using Smith Normal Form.
Read More →In this post we explore persistent homology and how it is constructed.
Read More →Using Kaggle's Simpsons data set, we determine which episode is the definitive Best Episode Ever!
Read More →We use R to forecast the time series of monthly sales of new one-family houses sold in the USA from 1973 to 1996.
Read More →Comparison of data pipelining libraries Spotify's Luigi and Airbnb's Airflow.
Read More →I'll show how to tansfer a file or directory from your computer to an existing EC2 instance.
Read More →I go through how to how to set up an Amazon EC2 instance and setting up an environment for scientific computation.
Read More →