Instagram Clustering

My dad is a nature photographer and has over 1700 of his photos on his instagram account. I scrapped the photos from his account. I wanted to try out some image recommendations (i.e. show me similar photos to the selected photo) and try to see if I could properly cluster them, as a first look suggests there are certain categories of photo, e.g. landscape, elk/moose, birds, etc.

Instagram picture

Intel Image Classification

Tensorflow has great tools for image analysis and classification, so I wanted to try them out for myself. To get started I used the Intel Image Classification Dataset and follow a standard approach. Then I used a set of images scraped from google image search and experimented with transfer learning.



Tensorflow picture

Covid Cases Forecaster

Time series forecasting involves using historical scalar data of a time series to predict future observations. I used an ARIMA model on the number of daily Covid cases in the US, which I scraped from Wikipedia. I also added the number of daily protests (downloaded from here) as an exogenous variable to the model, which resulted in a better fit to the model.

Covid cases picture

Grocery Cart Recommender

Collaborative filtering recommendation systems are fun, but it's hard to find data for it that's not proprietary. So I used a clean dataset from Kaggle, transform the data, and feed it to SVD, and test it using standard and custom methods.



Grocery picture

Ad Auction

I used Thompson sampling to simulate how a website might employ ad auctions to decide which ad should be placed on the website based off the expected value. The document shows how to continuously update the model after several campaigns and how this affects the distributions of expected values. This follows this blog entry.

Ad Auction picture

Marvel Movie Scripts NLP

I used Natural Language Processing (NLP) tools to analyze Marvel Movie Transcripts from this website. I used the scraped transcripts to some classification (MCU vs. other marvel projects) and some Latent Dirichlet Allocation (LDA) to see which movies are most similar to each other. Both approaches ended up relying heavily on the names, which, due mainly to the inconsistent script structure, was infeasible to properly remove.



Marvel Logo

Crypto Picker

While I was a Data Science Fellow at Insight Data Science I worked on a machine learning web application that assists cryptocurrency investors in growing their portfolios. The web app was launched to a domain name using AWS, but has since been taken down due to costs. You can find the code to analyze the data and build the model in my github repo here and the code to build and launch the web app here.

Crypto picture

Six Degrees to Joe Rogan

This is a study of the interconnected network of podcast hosts and guests, named for Joe Rogan, who is particularly central in the network, given that he has released over a thousand episodes with interviews with people from all walks of life. This website can be used as a podcast discovery tool where you can see all the guests of various podcasts, and all the podcasts that each guest has been on. You can also explore the various connections between podcast, hence the "Six Degrees". There are more indepth explorations into this dataset explored in the Advanced section.



network icon

Kaggle Kernels and Competitions

Kaggle is a website where users can post datasets and others can post analyses, or kernels, both of which can be upvoted and be seen by more users. Companies and organizations often post competitions where they supply a dataset and offer cash prizes to whoever submits the best set of predictions. I used this site to practice and refine my python and machine learning skills. Here are a few of my submissions.

Data picture

Data Visualization

I took a graduate-level Data Visualization class in the Media and Arts Technology department at UCSB. We focused on using MySQL to query databases (specifically, checkout data for the Seattle Public Library for this course) and Processing to display and visualize the data in 2D and 3D environments. For the final project, I used data generated from MESA for a star with the same mass as the Sun from birth to death. A video from the resulting visualization is presented to the right. Souce code.