2025 Student Research Projects | Columbia Engineering

Institution: Columbia

Mentor: Baishakhi Ray (Columbia)

Knowledge Graphs Aid Patch Refinement in Autonomous Program Repair

The use of large language models (LLMs) in software engineering tasks such as autonomous program repair has drastically increased in recent years. Although LLM-based solutions have shown impressive results on difficult benchmarks (e.g., SWE-bench), language models are inherently limited when generating patches for issues in massive code repositories due to their inability to effectively track imports and data flow across files and their diminishing performance when handling longer context windows.

To address these limitations, we supplement an existing agentic program repair workflow, namely SemAgent (Pabba et al., 2025), with a repository-level knowledge graph that efficiently stores cross-file import relations, call trees, and data and control flows. We allow the repair agent to extract detailed call trees from this graph during the consistency check phase of the workflow, enabling a thorough examination of all code locations that may need to be altered in order to maintain consistency with a preliminary patch or handle edge cases that were not originally accounted for.

Thus, by using knowledge graphs to handle cross-file import relations and break full-file examinations into smaller, chunked steps, we are able to address the limitations posed by pure-LLM approaches and improve the quality of the final generated patch.

Robert Boada

Institution: Columbia

Mentor: Suman Jana (Columbia)

LLM Fuzzing

Fuzzing is a dynamic software‐testing technique that exposes program vulnerabilities by repeatedly supplying unexpected or malformed inputs and monitoring for crashes or anomalous behavior. Traditionally driven by randomized or coverage‐guided mutations, recent advances have begun to integrate large language models (LLMs) to enhance fuzzers’ ability to generate syntactically valid, semantically meaningful test cases.

By leveraging LLMs’ deep understanding of programming languages, data formats, and protocol specifications, modern fuzzing frameworks can produce high-quality seed inputs, prioritize mutation strategies toward unexplored code paths, and adaptively learn from execution feedback.

Ashley Chen

Institution: NYU

Mentor: Xia Zhou (Columbia)

Acoustic Embedding for Deepfake Detection and Prevention

Falsified videos, in particular, deepfakes, have become widely popular and fairly easy to produce in the last couple of years. Deepfakes have the power to exploit the platform of highly influential figures by impersonating them, leading to many instances of financial loss and political disruption. Thus, we propose a physical signature framework to create and embed dynamic signatures physically in order to secure videos at their digital creation. Specifically, my project this summer focused on audio, using echo hiding to encode live transcriptions from speeches in audio playback.

Daniella Cardenas

Institution: Columbia

Mentor: Lucy Simko (Barnard)

Understanding Today’s Immigrants’ Digital Security and Privacy Experiences

The U.S. government administration has issued a series of restrictive immigration policies since January 2025. According to the United States’ official ICE website, during the first 100 days of President Trump’s second term (January 20-April 20, 2025), 66,463 immigrants were arrested and 65,682 were deported. Three months later, President Trump’s reconciliation bill, self-coined the ‘Big Beautiful Bill’, allocated approximately $170 billion to immigration and border enforcement, including $42.5 billion going towards the construction of new detention facilities.

This study examines how immigrants in the U.S. may be adapting their digital privacy and security practices in response to these political changes. Through in-person flyering, research platform recruitment, and personal networks, we conducted semi-structured, hour-long interviews (n=11) with recent immigrants of different legal status and of diverse regions (i.e. China, Ecuador, Russia).

To ensure the safety of our participants, we obtained informed consent, redacted any identifying information from interview transcripts, and stored all research data locally. Our analysis reveals that participants exhibit widespread institutional distrust aimed at the government, social media companies, and data brokers; demonstrate heightened levels of hyper-vigilance both online and offline; and employ extensive self-censorship ranging from limiting their immigration and political conversations online to opting out of entire social media platforms in the effort to protect both themselves and their communities (family, friends, other immigrants).

These findings highlight how current immigration policies erode the digital and physical autonomy of many immigrants, underscoring the need for privacy-preserving technologies and data policy reforms. Into the fall semester, we will conduct additional interviews until we reach thematic saturation, then synthesize our findings and recommendations into a paper we will submit for publishing. Through this paper, we aim to mitigate the disproportionate burden of security and privacy placed on immigrants, ultimately fostering safer digital spaces for marginalized communities.

Arav Dhoot

Institution: Columbia

Mentor: Junfeng Yang (Columbia)

Grounded Guidance: Preparing an Evaluation Dataset to Reduce Hallucinations in Columbia’s Travel and Expenses Chatbot

Columbia University is developing a Retrieval-Augmented Generation (RAG) chatbot to support Travel and Expenses (T&E) queries, but the non-deterministic nature of large language models can lead to hallucinated responses that mislead users and undermine institutional trust. To address this, we created a high-quality golden dataset by manually validating over 3,600 historical QA pairs, originally labeled by an LLM-based classifier, and found that nearly 35% of the supposedly valid responses were incorrectly labeled.

We then trained a RoBERTa-based binary classifier on a balanced and preprocessed subset of the data, achieving a validation accuracy of 84.92% with a frozen encoder and optimized hyperparameters. We also tested architectural variants, identifying the no-dropout model as a strong candidate for further refinement. In the next phase, we plan to combine model-based error detection and automated prompt tuning via DSPy to improve annotation consistency and build a generalizable, unsupervised labeling framework.

Laszlo Godde

Institution: Columbia

Mentor: Junfeng Yang (Columbia)

Fuzzing Large Language Models for Robustness

The range of applications of modern large language models (LLMs) is continually expanding; however, they may falter unpredictably when processing malformed or adversarial text inputs. This project is focused on the creation and analysis of automated fuzzing frameworks which are specifically designed for transformer-based LLMs. We create and implement novel input-mutation techniques, including but not limited to: paraphrase swaps, character corruptions, and logical rephrasings. We integrate these techniques into a comprehensive system which issues modified prompts on a scalable level, gathers responses, and monitors for behavioral divergence.

We applied it to two open-source LLMs and benchmarked question answering, instruction following, and code generation, LLMs with our methods reveal previously undetected failure modes at rates as high as 23%. We further classify these errors into hallucinations, instruction leaks, and format violations. On a single GPU, we achieve a cumulative output of 5,000 prompts per hour. In the course of the project, we provide (1) the mutation-operator library, (2) the evaluation harness, and (3) an error taxonomy, all of which provide the foundation for adversarial evaluation mechanisms within the LLM landscape.

Angelina Gargano

Institution: Barnard

Mentor: Lucy Simko (Barnard)

Security, Privacy, and Safety of Student Journalists

Student reporting and publications play a vital role in journalism. However, student journalists’ unique identity as both students and journalists also contribute to them experiencing unique privacy and safety risks. Through semi-structured interviews with student journalists at Barnard College and Columbia University, we began to understand their experiences, needs, and mitigation strategies regarding safety, privacy, and security.

The team conducted 18 interviews (9 this summer) and transcribed them using locally downloaded software. To analyze the transcripts, we conducted reflexive thematic analysis. First, we read through the transcripts and extracted themes to create a comprehensive thematic codebook. We then started a thematic analysis on the interview transcripts by applying the codes systematically.

Preliminary emerging themes regarding perceived security and privacy threats facing student journalists include digital harassment, threats from law enforcement, academic disciplinary action, doxxing, compromise of sources, and compromise of data. We found that student publications also have a number of mitigation strategies for these risks such as limiting organizational access to sensitive data, internal rules regarding publishing practices, support from external organizations, and physical safety measures such as press passes when reporting in the field.

Additionally, the intersection between students’ personal identity and their journalistic identity played a role in their threat models. Student journalists’ perceived security and privacy risks were dependent on factors such as race, ethnicity, and citizenship status. The participants also expressed student-specific concerns such as pressure to report on peers and lack of institutional knowledge due to quick publication member turnaround. Increasing usability of open source software for secure data storage and LLM transcription are some preliminary recommendations we have for technologists to improve student journalists’ security and privacy in the digital world.

Dylan Tran

Institution: Columbia

Mentor: Nikhil Garg (Cornell Tech)

Dynamically-Updating Sparse Autoencoder for Modeling Concept Drift in Social Media

This project develops a dynamically-updating sparse autoencoder to model evolving topics and capture trends in social media. Traditional SAEs require full retraining to adapt to new data, making them computationally expensive and prone to catastrophic forgetting, which prevents lineage tracking for concept evolution. Our method incrementally updates the autoencoder each year using ArXiv and Bluesky datasets, growing the neural network to capture new concepts and information each year.

We track neuron biological lineages by mapping how each concept split, merge, and evolve. This approach reduces compute cost by only training on new neurons instead of the whole neural network, preserves interpretability via sparse activations, and offers a framework for real-time applications such as a new browsing mode to support feed personalization and a trend detector on social media.

Maya Jhamb

Institution: Columbia

Mentor: Farshad Khorrami (NYU)

Adversarial Attacks on Acoustic Side-Channel Models

This project investigates the vulnerability of modern machine learning models to acoustic side-channel attacks — a type of attack where keystrokes typed on a keyboard can be inferred solely from the sound they make. I began by reimplementing a high-performing keystroke classification model based on CoAtNet and trained it on audio recordings of individual key presses. The model achieved high accuracy in recognizing keys based on their unique sound profiles.

I then designed and tested several custom adversarial attacks to reduce the model's predictive power, including white noise addition, fast gradient sign method (FGSM), and novel perturbations like audio blending and echo injection. These experiments demonstrate how small or natural-sounding modifications to keystroke audio can significantly degrade model performance, offering potential defenses against this class of acoustic privacy threats.

John Lee

Institution: Princeton

Mentor: Tushar Jois and Rosario Gennaro (CUNY - City College)

Addressing Vulnerabilities of Zero-Knowledge Proofs of Training on Optimum Vicinity

Zero-knowledge proof of training (zkPoT) allows a prover to prove to the verifier that their model was trained correctly on a committed dataset without revealing any information regarding their model or dataset. zkPoT on optimum vicinity is a novel approach that proves the correctness of the training model by bounding the distance – denoted as epsilon – between the trained model and the mathematically optimal model for models that can be viewed as the solution to a convex optimization problem. It addresses the issue of rejection sampling and demonstrates significant performance improvements compared to any other approaches.

Despite the benefits, this approach appears to open doors to other issues that pose security risks: 1) if a model is trained to obtain peak validation accuracy but is not within epsilon distance of the optimal model, then the prover has to intentionally overfit the model in order to prove that the model is “correctly trained,” 2) if we use a dataset to train the model – labeled as model A – and then split the same dataset into a train set and fine-tune set to train the model using the train set and fine-tune the model using the fine-tune set – labeled as model B – and we find that model A and model B are not within epsilon distance of each other, then the completeness is broken, and 3) if we utilize “unauthorized dataset” to fine-tune model A – labeled as model C – and we find that model A and model B are within epsilon distance of each other, then the soundness is broken.

In order to address and prevent any malicious exploitation while using the approach, this project uncovers the vulnerabilities in the approach and aims to address them. We show that our speculations are valid for the first problem and prove that the approach is unusable in practical settings.

Atif Mahmood

Institution: CUNY - Queens College

Mentor: Henning Schulzrinne (Columbia)

Secure and Usable Wi-Fi Onboarding via DPP and Enterprise Infrastructure

This project explored secure onboarding of IoT devices to WPA2-Enterprise Wi-Fi networks using the Device Provisioning Protocol (DPP), also known as Easy Connect, and EAP-TLS authentication. Traditional WPA2-Enterprise onboarding mechanisms currently rely on human interaction, preshared credentials, or some sort of interface, which is not feasible for many on the market headless IoT devices. We've developed a system that scans a DPP QR code to extract a device's public key and MAC address.

In congruence, we have developed a way to generate a certificate signing request from that device using a generated key pair, and have successfully transmitted that certificate signing request to a configurator. The configurator, via a POST request, sends an API request to a server that hosts a FreeRADIUS-compatible certificate authority. Then, via the external API, the certificate gets signed and returned to the configurator, and the configurator returns the certificate to the enrollee. This enables a fully automated certificate provisioning mechanism for WPA2-Enterprise using existing enterprise infrastructure, creating backwards compatibility with DPP and EAP-TLS.

Tomer Nahum

Institution: CUNY - Hunter College

Mentor: Xia Zhou (Columbia)

Attacking Near-Infared Palm Vein Authentication Systems

Palm biometric systems are an emerging technology that is being deployed in the commercial sector, such as with Amazon One and Tencent Palm. However the industry is ahead of the public research. We wanted to try to attack these systems to test for vulnerabilities, in addition to working on our longer term project of collecting a new real-world-simulating dataset. In order to attack these systems, an attacker would first need to steal someone's palm signature information before they could use it to spoof attack a real system.

For the REU we focused on proving out this first step, testing it’s feasibility and designing a method to help accomplish it. Palm vein based biometric authentication systems take in near infrared images of the person’s palm, highlighting their unique vein pattern which is revealed under NIR light. Obtaining a NIR image of a victim’s palm and veins is would not be very practical for an attacker. What’s more practical is obtaining a regular image of the victim’s palm, which could be collected by a hidden camera or taken from an innocent seeming social media post.

Elizabeth Wei

Institution: NYU

Mentor: Eysa Lee (Barnard)

Towards Efficient Pairing-free Security

Modern cryptographic protocols continually seek efficiency and elegance to achieve maximal functionality under the weakest possible assumptions. Algebraic structures have granted cryptography theoretically efficient operations; however, practical considerations remain intricate. Pairings stand out as highly effective yet non-standardized cryptographic assumptions. Although pairings efficiently enable bilinear verification of discrete logarithm equalities, they remain largely unsupported in standardized implementations. This project investigates pairing-free techniques within cryptographic protocol design, aiming to explore novel applications and designs, with particular emphasis on anonymous credentials and verifiable random functions.

Dominick Gordon

Institution: CUNY – City College

Mentor: Damon McCoy (NYU)

Audit of Youth Privacy and Safety Features In Social Media Apps

This project examined the effectiveness of Instagram’s privacy and safety tools designed for users under the age of 18. Using controlled “teen avatar” accounts, I evaluated whether these tools exist as described in Meta’s public statements, how they function in practice, and whether they can be bypassed. While some protections were in place, such as pop-up messages for searches related to self-harm or body image issues, they could be easily circumvented by using misspellings, alternative phrasing, or searches in other languages.

Harmful content, including posts promoting self-harm, negative body image, and unmonitored accounts run by minors, remained accessible and was often amplified once engaged with. These findings indicate that current safety features provide only limited protection and require significant improvement to address real-world behaviors and risks faced by young users.

Sara Lignell

Institution: Georgetown

Mentor: Damon McCoy (NYU)

Understanding User Experiences and Bot Creation Practices on Character.AI

This project investigates how users engage with Character.AI, a popular chatbot platform that enables the creation and interaction with AI-powered characters. Using a mixed-methods approach, we examine both the content and structure of chatbot creations as well as the narratives and experiences shared by users in online communities. Our quantitative analysis focuses on patterns in bot themes, configurations, and popularity, while our qualitative work draws from user discussions across multiple Reddit communities. Together, these perspectives shed light on why people use Character.AI, how they navigate its creation tools, and how recent platform changes influence user behavior.