A while back, I had the privilege of leading the creation of CrossVul, my comprehensive source code dataset designed to advance automated vulnerability detection and repair across multiple programming languages. Our paper, “CrossVul: A Cross-Language Vulnerability Dataset with Commit Data,” was published at the ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE).
Recently, we released BinPool, a comprehensive dataset designed to advance vulnerability detection and binary security analysis. BinPool was published at the 33rd ACM International Conference on the Foundations of Software Engineering (FSE ’25), underscoring its significance to the research community. Unlike many existing datasets that rely on synthetic bugs or source code alone, BinPool focuses on real-world vulnerabilities in binary executables collected from Debian packages with matching vulnerable and patched versions.