The Vulnerability History Project

The Vulnerability History Project

A museum of mistakes

to help us engineer secure software

Every blunder has a backstory

You've seen it. Another vulnerability.

Heartbleed. Shellshock. Rowhammer. Spectre. Meltdown. WannaCry. Log4Shell. Always a terrifying name. Always an ocean of information to absorb in exactly zero days.

Well-meaning engineers labor hard to make their software both function and be secure. In the thousands of lines of code they write, test, document, and code review... they make one mistake... and that's the one that ends up becoming famous.

What now?

The media frenzy blows over, and we're left with questions. How does this impact us? Should we keep using this library? How do we know that they learned their lesson? How will we know that we haven't made a similar mistake?

Is anyone taking a systematic look at the complex histories behind vulnerabilities? In VHP, that's what we do.

We dig deeper. We combine state-of-the-art automated repository mining techniques with crowd-sourced curations to collect rich histories of vulnerabilities. We're not about blaming, we're about learning.

Data for researchers

Our database has clean, curated, preprocessed, tagged, and citeable histories of software vulnerabilities of prominent open source projects. VHP 100% open source, with data available on our GitHub or our RESTful API

Projects for students

Everyone learns better from real examples. Instead of contriving hypothetical scenarios, students can see how a vulnerability manifests itself in a project.

Techniques for engineers

Learn from others, and learn from your own history. Using our visualizations and guides, VHP equips you to look for systemic problems in both your product and your process.

A story emerges

The vulnerability is the endpoints of a long journey. First we trace the vulnerability to its source code fix. Then we mine the repository to find the Vulnerability Contributing Commit (VCC). Then we automatically construct a timeline, then tag. A curator then comes along and corrects the data, telling the broader story of the vulnerability.

With this methodology, what appears to be a "simple coding mistake" becomes a unique narrative. Reality is messy, tedious, and nuanced.

Our philosophy is to always provide more context and let the story tell itself.

Everyone can learn

We want everyone to learn from vulnerability history, so we are building activities, assignments, and projects for educators and trainers to adopt. We target for varying levels of technical background, ranging from introductory programming experience to the seasoned software veteran.

Evidence-based paradigm

Every field that has adopted an evidence-based paradigm has been revolutionized. From medicine to traditional engineering, having your decisions being driven by data advances the field. Software engineering is next.

Mixed Methods

In academic parlance, we are combining quantitative and qualitative methods to collect data. When we can, we automate. But we also believe that not every observation can be automated and requires some human judgment as well.

We dig deeper

This is not a dumping ground for CSVs. This is a well of data and analysis that can be revisited for decades to come.

Projects such as the CVE and NVD aim to be comprehensive, but remain surface level. We can learn through detailed systematic taxonomies and lessons through projects like CWE and OWASP, but differentiating what is possible from what is probable is difficult. Automated scanners liked Snyk can help get you started in fixing your existing problems, but ignore the socio-technical factors behind the engineering process, leaving teams to continue to repeat their mistakes.

We've noticed that very little is known or recorded about how a vulnerability came about and how it was missed. But, modern software engineering produce rich artifacts like Git repostories, pull requests, and bug databases that are ripe for mining.

Indeed, entire academic communities are devoted to understanding and improving engineering practice through mining software repositories. We come from that community, but we also acknowledge that models and experiments are ephemeral. Data is forever.

There's a lot we don't know

All of science is a work-in-progress. For this project to work, we need volunteers to help collect, correct, and curate vulnerabilities. As you peruse the site, watch out for unanswered questions that you can contribute to.