LLM Security Course Project Ideas

A diverse set of research-oriented project prompts spanning information flow, agent security, error profiling, and fault-tolerant system design.

Information, Leakage & Foundations

How much information does a prompt actually contain? Propose ways to estimate the information content or sensitivity of a prompt or response, and study how this relates to model behavior.
Prompt injection as an information-flow violation Reframe prompt injection attacks using ideas from taint tracking or information-flow control.
When do models amplify vs. suppress sensitive information? Analyze how information changes across layers, decoding strategies, or conversation turns.

Security boundaries in LLM agents Study how information flows between agent memory, tools, environment, and the base model.
Indirect prompt injection in agent workflows Measure how untrusted tool outputs or retrieved documents influence agent decisions.
Failure modes in multi-step agent reasoning Characterize how small errors or injected instructions propagate across agent steps.
Can agents be sandboxed? Explore architectural or prompting strategies that limit what an agent can do or leak.

Learning the error profile of an LLM Systematically study when and how models fail: hallucinations, instruction violations, refusal errors, or unsafe compliance.
Do different models fail differently? Compare error distributions across models, sizes, or training regimes.
Are model errors predictable? Investigate whether uncertainty signals, logits, or embeddings can predict failures.

Redundancy for safer LLM outputs Explore majority voting, model ensembles, or self-consistency as security mechanisms.
Diversity vs. redundancy tradeoffs Study whether using multiple diverse models improves robustness over using the same model repeatedly.
Distributed-style LLM systems Apply ideas from distributed systems (replication, quorum, failover) to LLM decision-making.
Byzantine models in an ensemble Analyze what happens when one model behaves maliciously or is compromised.

Adaptive attackers vs. redundant defenses Study how attackers adapt when faced with multiple models or layered defenses.
Adversarial prompts vs. ensemble methods Test whether attacks transfer across models and where redundancy breaks down.
Can we certify robustness empirically? Propose metrics or evaluation protocols for measuring practical robustness.

Benchmarking agent security failures Build a small benchmark capturing common agent-level security issues.
Metrics for “control” in LLM systems Propose ways to measure how well developers retain control over agents and tools.
Human vs. model disagreement as a signal Study whether disagreement between models or humans predicts unsafe outputs.

Toward fault-tolerant AI systems Design and evaluate an LLM system inspired by fault-tolerant distributed computing.
Composable security for agents Explore whether secure components remain secure when composed into agents.
A taxonomy of LLM failures Build a principled classification of errors, attacks, and misalignments.