About Me

My name is Md. Momen Bhuiyan. I will be starting graduate studies in Virginia Tech in the department of computer science this fall. I have finished my undergraduate study at Bangladesh University of Engineering and Technology(BUET). After that I was a Jr. Software Engineer at REVE Systems Ltd. for a year. My research interests are data mining and machine learning, especially in deep learning. I have completed my undergraduate thesis under the supervision of Prof. Md. Monirul Islam. My thesis topic was modifying a density-based Subspace Clustering algorithm. We worked on improving PreDeCon. For the detailed thesis see this. Other than that I have also worked on a research project on Distributed Traffic Simulator for heterogeneous network based on Dhaka city traffic with Himel Dev under Dr. A. B. M. Alim Al Islam. Detail for this project can be found here.

I like reading, especially thriller and science fiction. I have participated in several contest during my undergraduate. My team became 3rd in CFICC. We got best demo award MoHCI. I have also helped build these sites(BUETECH, Mojadar). I am not very active in online sites. Still here are some links @github, @stackoverflow. My programming competition handles are : @codeforces, @topcoder. My social network profiles are: @facebook, @linkedin.

My resume for academia is here and for job is here.If any link is broken contact me at <momen_bhuiyan@yahoo.com> or <lnman000666@gmail.com> .

Undergraduate Thesis: A New Nonparametric Subspace Clustering Algorithm Using Graph and Cost Based Measure

Introduction

Unsupervised mode of learning is widely used in our daily life. Most of our data comes without any label. To make sense of that clustering is widely used in scientific community. If the data has very high dimension then several phenomenon makes it very hard to cluster. On top of that visualizing the data in high dimension is impossible. Therefore several algorithm has been proposed. One of them is PreDeCon which is extended from the idea of DBSCAN using subspace preference. In our thesis we solve several issues that persist in predecon. A paper is in preparation for "International Journal of Computational Intelligence and Applications" and the current draft can be found here.

Our contribution

Two problem persist in Predecon:

Fig1. Problems with close clusters

Mainly four parameters are needed for PreDeCon to work. They are distance, number of points in that distance, length of preferred dimensions and variance threshold . To address the problem of estimating these we find a reasonable parameter from the data given. This is done in the preprocessing stage of the algorithm. We do this by sampling the data several times and averaging the parameter from them. Our assumption for the estimation is that:

To solve the second we propose a change in the PreDeCon algorithm. We add an additional condition in the cluster core condition and a post processing stage where we apply a graph cutting algorithm to find clusters that can be split under reeasonable condition. For the implementation purpose we used min-cut algoirthm which should be replaced by sparsest cut algorithm.

Experimental result

We used several datasets from different sources. A full listing can be found in the thesis. We used several evaluation technique to judge the result. Experimental result show good result in most evaluation techniques for the datasets. Sample results are given below.