This project is done as part of major project in Information Retreival and Extraction course at IIIT Hyderabad for session 2016-17.
The course is taught by Prof. Vasudeva Varma and co-taught by Prof. Manish Gupta.
Recently there has been an increasing attention to use Deep Learning(DL) techniques to analyze social graphs, such as Flickr, Youtube, Twitter and so on. The beauty of such solution is that once DL is applied, several network mining tasks such as node classification, link prediction, node visualization, node recommendation can be solved by conventional machine learning algorithms.
In this project, we will build a model that can capture the network information of a node in an efficient and scalable manner. These learned representations will be used to do nodes classification in our project.
We will exploit the labelled information in the data to learn a better representation targeted specifically for classification task.
This project studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph em- bedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we implemented a network embedding method called the “PTE(predictive text embedding),” which is suitable for arbitrary types of informa- tion networks: undirected, directed, and/or weighted.
The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. We test the method on IMDB movie review dataset. The novelty in PTE is that it exploits the labelled information in the graph to fine-tune the embeddings.
Better representation of nodes helps in solving various network mining tasks by conventional machine learning algorithms. It can be used for:
Node classification Link prediction Node visualization Node recommendation
Links to project resources
Project Description (video)
Project Description (slides)
Support or Contact
For any query related to project please feel free to contact any of the author of the project. Contact information is :
Shashank Gupta - email@example.com
Nishant Prateek - firstname.lastname@example.org
Karan Chandnani - email@example.com