Skip to main content
Apply

Arts and Sciences

Open Main MenuClose Main Menu
Department of Computer Science

REU Projects


Big Data Analytics at Oklahoma State University - Projects

Students will participate in specific research projects related to big data analytics within the overall cohort research experience. They will be introduced to the fundamentals of big data analytics and work on real-world projects. They will get some hands-on experience with Python and MATLAB libraries on many machine learning and data mining models. They will generate a hypothesis, design and implement a research plan to test their hypothesis and analyze the results with the help of research mentors. The students engaging in any of the tasks will be given co-authorship in research papers generated.

 

Research projects are divided into three different categories based on the domain of the data: network, health and image data. We present brief descriptions of various projects below:

 

Network Data Analysis

Networks are effectively used to represent relationships and dependence between individual units (i.e., vertices). Node classification, community detection, and link prediction are some applications of network analysis in many different areas, such as social networks and biological networks.

 

Students will be trained in analyzing and developing methods for performing research into massive real-world social, collaboration, and information networks such as online social networks like the BlogCatalog, Reddit, Twitter, and Gab, as well as scientific networks like DBLP and Cora. Some of the projects in this area are listed below.

 

Enhancing security awareness in autonomous system by the application of Explainable AI model (Dr. Sharmin Jahan

 

Research in the autonomous system is gaining popularity due to its inherent advantages of improving its resiliency, which operates in a dynamic environment. The main advantage of an autonomous system is it adjusts functionalities without or with minimal human intervention, which we can call adaptation. But the adaptation has the potential to introduce new security vulnerabilities in the system. So, security has become one of the primary concerns for the autonomous system. The system should have knowledge about its security profile, interpret the operational environment’s security state, and assess the ability to choose the best adaptation for the system. But the dynamic environment includes uncertainty, which is not always possible to predict in prior, and thus pre-defined rules are not enough for interpreting the environment. This research aims to embed security awareness by interpreting dynamic operational environment with potential uncertainty using explainable AI model, which compliments the choice of optimal adaptation to continue its operation and maintain its security capabilities. The REU students will explore challenges related to explainable AI in security for different domain applications and potential solutions to resolve the challenge.

 

Quantifying Online Polarization (Dr. Bagavathi)

Social media has amplified the polarized views of society and has had catastrophic effects on both the individual and collective response to events like COVID-19. By tracing the polarized news propagation from different media outlets to social media, we will be able to capture the footprint of political bias and quantify the impact it has on problems like rumor detection. Some pressing questions in computational social science research are to identify “how?” misinformation originates and “why?” it becomes viral in social media. In this work, we aim to add a dimension to this line of work with our contributions in mapping and quantifying polarization networks in social media. This project will establish the correspondence of news media content, social media user responses to these contents, and user interaction dynamics to quantify online social media polarization.

 

REU students involved in this project will work under the supervision of Dr. Bagavathi with his graduate students to develop to achieve these objectives. They will gain knowledge about state-of-the-art text and graph mining methods, understanding and developing methodologies for social media data, and introduce students to the computational social science domain. Furthermore, these projects will give an arena for undergraduate students to think and tackle on-going polarized and hateful situations in social media.

 

Health Data Analysis

Health data analysis, also known as clinical data analysis, involves the extrapolation of actionable insights from sets of patient data, typically collected from electronic health records. Electronic health records (EHR) is a systematized collection of patient and population electronically-stored health information in a digital form. Collecting and labeling data, designing accessible data format and data storage and developing predictive models for the different diseases are some of the specific and time-consuming challenges faced by the research community. Moreover, requiring domain-specific knowledge to produce results that are deemed trustworthy by the medical community is also a big challenge for analyzing healthcare data.

 

Extracting Information from EHR data. (Dr. Shamsuddin)

EHR is a valuable source of data for any medical research. However, the quality and ease of use of EHR is questionable. Thus, we propose to develop a standard framework for converting the EHR data into easily accessible patient profiles. In one of our previous works, we define a patient profile as a low dimensional data structure that contains a relevant summary of each patient. The quality of the patient profiles will be determined through the ability of various machine learning models to map the patient profile to the corresponding diagnosis accurately. This will facilitate the use of objective decision-support systems that used HER data for real-life clinical practice.

 

Students will measure the quality of the patient profiles with various machine learning models to map the patient profile to the corresponding diagnosis accurately. We will work with publicly available EHR data through PhysioNet Challenges (e.g., Early Prediction of Sepsis from clinical data). Moreover, healthcare data has missing information or attribute values since data collection is a lengthy, time-consuming process. Students will learn how to use statistical tools (such as mean, median, variance and temporal relations) to fill out missing data.

 

Visual Data Analysis

The visual spectrum is a rich source of information encoding different modes of knowledge, such as physics, geometry, and semantics, into a concise representation of pixels. An image, or a video, can often contain rich sources of information that express a lot of information about the scene, such as the background, activity, location, and context that describes the primary semantic content of the captured scene. With the evolution of cameras in everyday devices like mobile phones and social media platforms like YouTube and Instagram, there is a tremendous amount of unstructured and unlabeled image and video data that can be leveraged for training deep learning models for visual understanding. With the projects of Drs. Aakur and Crick, students will be trained in analyzing the massive image and video data.

 

Self-supervised Predictive Learning for Visual Understanding (Dr. Aakur)

The visual spectrum is a rich source of information encoding different modes of knowledge such as physics, geometry, and semantics into a concise representation of pixels. An image, or a video, can often contain rich sources of information that expresses a lot of information about the scene such as the background, activity, location, and context that describes the primary semantic content of the captured scene. In this project, we aim to apply state-of-the-art deep learning techniques like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) in an unsupervised representation learning paradigm to extract meaningful visual representations of unlabeled data from the internet for better visual understanding. Traditional approaches to visual understanding required structured, labeled data for training and as such, can be prone to labeling errors. The overall goal of this research is to create a real-time visual event understanding system in streaming videos using continuous-valued deep learning algorithms to build unsupervised representations of visual data for better visual understanding.

 

Students will implement a predictive coding stack that will form the basis for learning representations in a self-supervised manner. They will integrate multiple modalities such as text and audio into the prediction framework to help ground the concepts in visual data.

 

Explainable and engineerable machine learning (Dr. Crick)

Machine learning has become a vast engine of the economy, owing to the availability of big data and developments in deep convolutional neural networks. However, CNNs have serious problems, in that they consist of collections of hundreds of thousands of parameters which are completely opaque to human inspection. Even if CNNs do well in learning tasks, the fact that we cannot follow their decision procedure renders their decisions less trustworthy, and also impedes our efforts to engineer systems that could accomplish the same tasks with fewer resources than CNNs require. Investigations into the semantic structure of such networks – what information is contained where, how it can be intelligently adjusted, when decisions are made and on which basis – are urgently needed.

 

This data analysis project will involve constructing and analyzing deep neural networks to identify patterns of learning behavior, attempting to construct structures for specific purposes, tracking the patterns of performance in transfer learning from one context to another. Research projects will include defense against malicious attacks, using generative networks for training and introspection, and examining the interrelationship and performance of different network architectures such as convolutional, autoencoding, long short-term memory, and others.

Back To Top
MENUCLOSE