LLM Security Course Project Ideas
A diverse set of research-oriented project prompts spanning information flow, agent security, error profiling, and fault-tolerant system design.
Information, Leakage & Foundations
- How much information does a prompt actually contain? Propose ways to estimate the information content or sensitivity of a prompt or response, and study how this relates to model behavior.
- Prompt injection as an information-flow violation Reframe prompt injection attacks using ideas from taint tracking or information-flow control.
- When do models amplify vs. suppress sensitive information? Analyze how information changes across layers, decoding strategies, or conversation turns.
Agentic & Tool-Using Systems
- Security boundaries in LLM agents Study how information flows between agent memory, tools, environment, and the base model.
- Indirect prompt injection in agent workflows Measure how untrusted tool outputs or retrieved documents influence agent decisions.
- Failure modes in multi-step agent reasoning Characterize how small errors or injected instructions propagate across agent steps.
- Can agents be sandboxed? Explore architectural or prompting strategies that limit what an agent can do or leak.
Error Profiles & Model Behavior
- Learning the error profile of an LLM Systematically study when and how models fail: hallucinations, instruction violations, refusal errors, or unsafe compliance.
- Do different models fail differently? Compare error distributions across models, sizes, or training regimes.
- Are model errors predictable? Investigate whether uncertainty signals, logits, or embeddings can predict failures.
Redundancy & Fault Tolerance
- Redundancy for safer LLM outputs Explore majority voting, model ensembles, or self-consistency as security mechanisms.
- Diversity vs. redundancy tradeoffs Study whether using multiple diverse models improves robustness over using the same model repeatedly.
- Distributed-style LLM systems Apply ideas from distributed systems (replication, quorum, failover) to LLM decision-making.
- Byzantine models in an ensemble Analyze what happens when one model behaves maliciously or is compromised.
Robustness & Adversarial Behavior
- Adaptive attackers vs. redundant defenses Study how attackers adapt when faced with multiple models or layered defenses.
- Adversarial prompts vs. ensemble methods Test whether attacks transfer across models and where redundancy breaks down.
- Can we certify robustness empirically? Propose metrics or evaluation protocols for measuring practical robustness.
Evaluation & Measurement
- Benchmarking agent security failures Build a small benchmark capturing common agent-level security issues.
- Metrics for “control” in LLM systems Propose ways to measure how well developers retain control over agents and tools.
- Human vs. model disagreement as a signal Study whether disagreement between models or humans predicts unsafe outputs.
Ambitious / Open-Ended
- Toward fault-tolerant AI systems Design and evaluate an LLM system inspired by fault-tolerant distributed computing.
- Composable security for agents Explore whether secure components remain secure when composed into agents.
- A taxonomy of LLM failures Build a principled classification of errors, attacks, and misalignments.