Genisa - An Advanced Information Retrieval System for International Student Recruitment
August 1, 2023
johnfengphd@gmail.com
August 1, 2023
March 15, 2023
The Toxic Comments Kaggle dataset is widely recognized as a valuable resource and serves as a prominent benchmark for multilabel text classification. I invite you to take a look at my insightful blog post where I delve into the exploration of this dataset and apply simple machine learning algorithms with resampling methods for treating imbalanced datasets. Within this work, I utilize cutting-edge deep learning models implemented with the Tensorflow and Pytorch frameworks. Moreover, I conduct a comprehensive comparison of various methodologies to effectively tackle the task of multi-label text classification.
March 1, 2023
Concepts - Computer Vision, Detection Transformer, Table Detection, Table Extraction, Optical Character Recognition (OCR)
Table extraction from documents using machine learning involves training algorithms to automatically identify and extract tables from a given document. This process can be challenging, as tables can come in various formats and layouts, and may be embedded within larger documents such as research papers, reports, or financial statements.
February 13, 2023
Concepts - NLP, Multilabel Classification, Imbalanced Dataset, Text Embedding
The internet has become an integral part of modern society and has greatly impacted the way we communicate and access information. While it provides many benefits, it also has its downsides, one of which is the prevalence of toxic comments. In this project, I attempt to solve the problem we have with toxicity on the internet by building a machine learning model that detects these toxic comments.
December 28, 2022
Concepts - Monte-Carlo simulation, price modeling, statistical modeling
I built a Monte-Carlo simulation based on the Poisson process to model the price dynamics of a ride-hailing business. Simulations provided projections of earnings and profit of the company for a 1-12 month period. Strategies for improving profitability are written at the end of this report.
January 15, 2022
Concepts - Bitcoin, Risk Analysis, Portfolio Optimization
In this work I analyze Bitcoin as an asset compared to other more traditional assets such as stocks and gold. Recently there has been large interest in Bitcoin as an investable asset due to it’s high returns in the last decade. I will attempt to give a full picture of the asset using the tools and techniques from traditional finance, then draw some insights and recommendations about whether or not to include it in the portfolio of any investment fund.