Machine Learning and Data Science Projects

Jigsaw Puzzle Solver Using Computer Vision (2019)

In this project, we were provided with 198 images of puzzle pieces. Our goal is to identify the type of puzzle piece it is by finding the corners, and whether there is a puzzle piece in the image or not, and drawing the bounding box around the puzzle piece. Using these information, we can then attempt to solve the puzzle. The code was written in Matlab.

Methods Used: Image Segmentation, Morphology, Harris Corner Detection

Teammates: Aaron Cote

Link To Paper: Solve Jigsaw Puzzles Using Computer Vision

GPS Classifier (2018)

We were given a set of GPS data, and were asked to predict when a car stopped due to a stop sign or traffic light, and when a car turned left or right. Additionally, the GPS data was made of many paths, to eliminate the clutters, K-Means was used to agglomerate paths and remove issues related to dilution of precision.

Methods Used: DBScan, K-Means, Decision Tree
Teammates: Joe Golden, Niccolo Dehicchio
Link to Paper: Making Predictions with Vehicle's GPS Data
Link to Github: GPS Predictions Github

Using Word Embedding to Automate Recipe Replacement (2018)

In this project, we explore the possibility of creating a model that can identify replacement for an ingredient. For instance, if the query was butter, it should discover that margarine is similar to butter, so that is 1 potential replacement. The applications for this are numerous: along with aiding individuals with allergies and dietary restrictions, this can also help cooks understand how to use ingredients they already have. The idea is to transform ingredients into embeddings. With ingredients represented as embeddings, cosine similarity can be used to find the closest replacement.

Paper Available Below: Using Word Embedding to Automate Recipe Replacement Research Paper
Link to Github: Research Github

Word Sense Disambiguation Using Decision List

The program determines the sense of a word based on context using Decision List proposed by David Yarowsky. The implementation was based on this paper Decision Lists for Lexical Ambiguity Resolution. In this paper, the author used WSD method to differentiate between two different types of accent. We apply this same method to predict the sense of words. (BASS vs. SAKE)Currently, the implementation is only capable of determining two word sense (BASS and SAKE). The dataset used for training this decision list is obtained from R.Sproat textbook.

Link to Github: WSD Github

Accuracy Matrix

Sentiment Analysis, Genre Prediction, Topic Prediction on Unbalanced Data (2018)

The data set contains sentences extracted from reviews about products, movies, and resources. The sources of review came from IMDB, Amazon, and Yelp. We were ask to complete 3 Tasks.

Teammates: Oliver Olonzo, Josh Bickings

Task 1: Predict the polarity of the review (Positive, Negative, Neutral).
Task 2: Predict the event classes/types (i.e Attending_event, Communication_issue, Going_to_places, Legal_issue, None, Money_issue, Outdoor_activity, Personal_care, (Fear_of_Physical_pain)
Task 3: Identify the source of genre of sentence (2 genre available: genre A or genre B)

Implementation Details and Results Available Below: Sentiment Analysis Implementation Approach and Results

Predicting language of text (Dutch/English) using Decision Tree
and
Adaboost (2018)

The purpose of this program is to determine whether the sentence is Dutch of English using Decision Tree and Adaboost.

Features Summary:

Testing and Training Sample Details

Link to Paper: Language Prediction using Decision Tree and Adaboost
Link to Github:
Language Prediction Github

Finding shortest route for Marathon Runners using A*(2018)

The purpose of this program is to find the shortest route from a start point to a final point defined by the user. However, the runner must pass certain points along the path in order to get to its final destination. In addition, the elevation, and terrain type must be considered when finding the shortest path. The terrain and its ability to impede a runner may change depending on the season. Details about how the heuristic for the A* is outlined in the "Link to paper" section.

Link to Paper: Shortest Path for Marathon Runner Paper
Figures: The path found is shown in red, and the blue points along the red paths are representative of the "must pass" points.

Fundamental Data Mining Concepts Implementation (2018)

Concepts implemented include:

All implementations are compiled in this Github link: Fundamental Concept Implementations
K Means
Agglomerartion
PCA

Grocery Detection