Security vulnerabilities in medical devices can have devastating consequences. They can lead to patient injuries and in the worst cases, death. In our first public 10x talk, we explore how common cybersecurity pitfalls can be mitigated using fuzzing—a modern software testing technique used at scale by companies like Google and Facebook. The high-profile Heartbleed bug is used as an example of how fuzzing can be effective.
Medical Devices, Cybersecurity, Stress Testing, Sanitizers, Fuzzing, C/C++
Bimba Shrestha: The title of this talk is Modern Cybersecurity Practices. This is a very broad title, obviously. I’m not going to cover everything that goes under modern cybersecurity practices applications to medical devices. I just picked this so that we’d have a common place to start out.
I’m actually going to talk about something pretty specific, it’s about a software stress testing technique called fuzzing and we’ll get into the details later. But basically, it’s a very effective way to find security vulnerabilities, other logic bugs, and a whole host of other things in memory unsafe languages like C and C++ code, typically. But it’s also being used in python, Java mostly for finding other types of bugs, they’re not finding buffer overflows in python and stuff like that, but it’s being used there too.
So, when I say cybersecurity, I could be talking about just about anything. I could be talking about the IT side of things, what a system admin would do, we could be talking about cryptography, we could talk about web security. Lots of medical devices have a web interface that you can use to interface with the device, or in some ways once you provide input to the device, that’s a potential attack vector. There’s a legal and compliance side of things, what is it that you have to prove to the FDA for you to be able to market your device as secure. There’s the human factor side, like if your end-user is a physician who doesn’t know what cryptography is, then you wouldn’t want to give them this control panel of all these options, they’re very likely going to make the incorrect positions there.
I’m not going to be talking about any of those things, I’m going to focus on what I call fundamental vulnerabilities. This isn’t, at least not that I know of, an actual technical category, but it just lumps everything that are more just like things that just shouldn’t happen anymore, given how long programming has been around, stuff like buffer overflows, or race conditions, or like use after freeze in C/C++ code.
So we’ll get into all of that. But just a quick, a little bit more I guess meat on how dangerous some of this stuff can be, I came across our MAUDE database. For anyone that doesn’t know, MAUDE is this database of incidents that happened with medical devices, something happens with a patient and a doctor using a medical device, they are obligated to report it here so that we can learn from it and track it. But I found this on our MAUDE alerts page where so there was a problem with an infusion pump and an infusion pump is just, it has a computer that mixes up a concoction of drugs before it gets put into the patient, and because of a buffer overflow issue, you can see the buffer overflow highlighted there, I just searched buffer overflow and this is one of the first ones that came up, because of a buffer overflow issue, the infusion pump just stopped working, it crashed, and so the patient in this case, they needed those drugs to survive, and they didn’t so the patient actually died.
It is obviously really terrifying to think about, but it’s even more terrifying that there are five or six other cases just like this one with just a quick search of buffer overflow. If you search for things like virus or malware or a whole bunch of other things, there’s even scarier stuff. So, there’s a lot of good reason to be investigating this area. Another example that’s not from the medical devices field, but one that’s just as consequential is something called the Heartbleed bug. So in 2017, Equifax, the financial data company, lost about 3-4 million of their users’ data because some hackers exploited Heartbleed, which is the name of a security bug and a whole bunch of other bugs and they didn’t know about it for quite awhile. So this was a really big security breach that cost a bunch of different companies, not just Equifax, a ton of money. And Heartbleed, just because this should be interactive, does anybody know what Heartbleed is? A show of hands. Okay, does someone want to take a stab at explaining it?
Matt Hayden: I can, if you want.
Bimba Shrestha: Yeah, go for it, Matt.
Matt Hayden: So, correct me if I’m wrong, but there was an addition to SSL ware, in order to keep a connection alive, you would send a payload plus the length of the payload as a structure and there wasn’t the check to see if I sent you just a short payload, but say 32,000 as the length that is guaranteed to fulfill the RFC. I don’t remember the RFC off-hand, it might have been 6540 and the way you could exploit this was you could ask for more memory than is allocated for that string and eventually after some repetition you might be able to come across some of the private keys held in the memory.
Bimba Shrestha: Yeah, that’s perfect. That’s exactly right. It’s based on the heart beat protocol, which is specified in RFC. The heart beat protocol is pretty simple, there’s a server and there’s a client. The client sends a heart beat request to the server, and it also sends the sequence of bytes that it calls the payload. The server’s job is just to return that same sequence of bytes back to the client, and then the client will verify the bytes sent equals the bytes received. So it’s just like testing the connections of heart beat. The problem, like Matt mentioned, is you can craft the heart beat request in such a way that way that when you get back a response, you actually get back more than just the payload that you sent. It’s obviously really bad, so we’ll actually do that now. We’ll actually go into the code and I can show you when this happens. Can everyone see my code?
Matt Hayden: Yes.
Bimba Shrestha: So I have a version of open SSL here, which has the Heartbleed bug and I’m in one of their library files and this is the culprit method. So if you look at what’s happening, it’s pretty straightforward, so they initialize some variables, I think the only relevant thing here is they take whatever data the client gives them and then the RFC, it says that the first couple of bytes of the data that you get from the client is just the unsigned integer of the length of the payload. So you just need to read that as an unsigned integer that’s exactly what this is doing, it’s populating this payload variable with the first couple of bytes read as an unsigned integer. Then there’s some other stuff that doesn’t really matter. Then there’s these two code paths, so you can either be requesting a heart beat or you can be getting a response. There’s nothing interesting happening in the response, so in the request, what happens is you allocate a new buffer and the buffer is the same size as the payload plus some padding. And then all it does is it mem copies whatever it got from the client so this is the same payload that we read from the client and then copies payload number of bytes from that onto this empty buffer they just allocated. And then it sends out the data. And then there’s some cleanup. So that’s basically the heart beat protocol. It’s pretty simple, the logic is pretty straightforward. But there is a bug here. Does someone want to point out what the bug is, if they’ve already found it? Or if they have a guess as to what it might be?
Reece Stevens: Seems like you just write whatever number you want in the beginning and you would get a buffer of that size back and that’s the payload.
Bimba Shrestha: Yeah. Exactly. This is just a blind mem copy, like whatever unsigned integer you get is what it will assume the size of the payload, what the actual payload is, so when you’re just copying whatever client data, so the client could say I sent you 1,000 bytes, really they only sent you eight bytes, this mem copy will just continue copying past whatever, past the eight bytes and just copy whatever is there. If you’re unlucky it could be, you know, like some confidential data like the search engine, it’s very likely that it could be like the certificate used. It’s obviously awful. So just keep this in mind. We’ll come back to this later when we do a quick demo with fuzzing. So that’s Heartbleed. So what is the remedy for stuff like this and for stuff like this, I mean anything that falls under like memory issues or use after freeze, stuff that you typically find in CC++ code that just shouldn’t exist anymore, but what are some typical remedies for stuff like this?
Matt Hayden: If you’re lucky, you can find it statically.
Bimba Shrestha: Yeah, static analysis. That’s a good first thing to do. Anything else? One thing that lots of people do is use sanitizers. So sanitizers are just dynamic testing tools, which just means you have to run the code for it to do anything that uses compile time instrumentation, so that just means that when you compile your code, you pass in some other flags that add on some bells and whistles so that you can de-bug this properly. But, sanitizers let you do a whole bunch of cool things.
Probably the most popular and powerful sanitizer that finds most bugs is address sanitizer, which comes with most relatively new versions of clang and DCC. But address sanitizer lets you find bugs that are about memory or address, just like basic bugs like buffer overflow. So if you have this integer or ray of 100 elements and then you try to access something outside of it, then you should get an explosion and if you run it with address sanitizer, if you pass in this F sanitize address flag and you compile it with clang and run it, then you’ll get, then it will tell you. So address sanitizer can also help you find, I’m just going to run through these sanitizers very quickly because I want to get to fuzzing, but each sanitizer is obviously complicated, there’s a whole bunch of perimeters you can play with and it’s worth diving into if you’re into this stuff. But it will also help you find stuff like use after freeze, so if you create an array, delete it, and then try to return an element inside of it, run that with that address sanitizer, it will catch it.
There’s thread sanitizer, which catches race conditions. It does some other things too, but race conditions are the most important things that it finds. So if you have a thread function set, this local variable 42, but in the main thread you’re setting it to 43, that’s a race condition, there’s no guaranteeing exclusive access to this variable. If you run that with thread sanitizer, it will find it pretty quickly.
There’s memory sanitizer which lets you find stuff like this, where if you initialize the integer ray, but then haven’t, for example, initialized it to zero, as far as the compiler is concerned, that’s an uninitialized value and it doesn’t know what could be there. So it will warn you about stuff like that.
There’s undefined behavior sanitizer, which, in my experience, hasn’t been all that useful. It’s like if you’ve ever used, run UBSan on any of your code, you’ll find that you get a ton of noise, most of which doesn’t really matter, some of which does, but most of it, it’s just like not conforming to this C++17 standard. It’s mostly just noise, sometimes it’s helpful, but most of the time, in my experience, it hasn’t been.
So what’s wrong with sanitizers? Why is it that they didn’t catch heart bleed or stuff like heart bleed as soon as it hit? Does anyone have any guesses?
Reece Stevens: Well, it seems like sanitizers can’t catch input issues, right, you know, it’s like the same problem of blindly accepting user input on a web form. You just don’t know what you’re getting.
Bimba Shrestha: Yeah, yes, I think that’s basically it, Reece. I think it’s like it depends so much on how good you are coming up with test cases for your unit tests, so like if you point your sanitizer in the right direction with the right input, it will find your bug, but you still have to come up with a test case that finds the bug. And that, developers do their best to come up with a relatively comprehensive test suite, but chances are you’re probably missing a whole bunch of stuff.
So the good news is you can do better. There is something called fuzzing. And fuzzing, so from the article I sent, the article I sent is an introduction to fuzzing, it’s not exactly what I’m going to talk about, but does someone want to try to define fuzzing or their understanding of it from the article?
Emma Curtis: Like randomly sending data to an application to see how it handles it?
Bimba Shrestha: Yeah, that’s exactly it. It’s just providing random input to a program trying to break it and then recording it when it gets broken. Just right off the cuff, that sounds like a really stupid thing to do because the space of all possible inputs is a whole lot larger than the space of inputs that will cause your program to crash, so it doesn’t really make sense to just randomly throw stuff at it hoping it will break and hoping you will discover something interesting.
In the very beginning when fuzzing was just getting started, it was a really stupid thing to do, and then they weren’t all that successful, so fuzzing research, nothing interesting happened for quite a while. I think now in the literature they refer to that as dumb fuzzing, or monkey fuzzing. But now it turns out that you can add a whole bunch of bells and whistles to that idea of generating random input and providing it to a program and make it really, really effective in finding hard to find bugs like the heart bleed bug. So just to compare testing from fuzzing, when you write unit tests or any sort of test, you have some ATI you want to test, you provide it with inputs, like input 1, input 2, input 3, and then if you have the test case, if your test case covers all the stuff you want to cover, then good, if not, then you’re missing a whole bunch of stuff.
Fuzzing is different because in this really simplistic model, you’re just running an infinite loop and generating input, generating new test cases and then passing that into your ATI and seeing if it will break. So that’s the only difference. I just usually think about it as generating test cases, like you can use address sanitizer, or any of the sanitizers with fuzzing, you’re just getting a lot more test cases. So there’s random fuzzing, which I talked about, where it’s a pseudo code, you generate some input, you execute it, you try to break the program.
But then there’s also coverage guided fuzzing, which is what really works. Random fuzzing doesn’t really work unless your input space is really tiny. With coverage guided fuzzing, the basic gist of what happens is you compile your code with some specific flags, then you choose a random input from a corpus, so you have a data set of input files that are pretty, it’s like a training model, like a training set in a machine learning model, you have a corpus of steps that you initialize your, let’s say your like, the weights in your matrix width, it’s like an analogous thing. So if you’re writing, if you were trying to test a JSON parser, your corpus, which is a bunch of JSON files or things that look like JSON files, it would have files that are non-binary with curly braces and quotation marks.
So you would pick out a random input from the corpus, you would mutate the input somehow and this is where different fuzzers set themselves apart. This is also where you can apply some machine learning, but you can generally do quite well without machine learning with just genetic algorithms that work really well. This is a very complicated area that responds to really fancy stuff that people do that usually goes way over my head. But you can mutate stuff, you can mutate the input and then after you mutate it, you pass in the mutated input into your program and because you instrumented your code with some flags when you run it, you will actually be able to record how much of your code base you covered in each execution. So you’ll have some metric of code coverage. And you can use that to guide your fuzzing.
So say you run, you execute this input, this mutated input, you record how much code coverage you had, if there, this is just one method, but if new code paths were discovered in this execution, then you could consider this a good data point and you could add it back into your corpus and then this would just keep going. There’s also other stuff you can do, but code coverage, improving code coverage is a good method. So you basically have this massive jungle of code paths and you’re just trying to find interesting areas and the more code paths you can find, the more likely you’ll find gruesome bugs. That’s the idea at least. So that’s coverage guided fuzzing. Is there anything else? Now we can go back into…
Yujan Shrestha: Just a quick question on the coverage guided fuzzing, like is there like a gradient or something that the fuzzer follows? Like how does it know that what it’s doing is increasing the code coverage, like what to do next? Or is it just…
Bimba Shrestha: In the fuzzers that I know the details of, they’re, it’s a lot more simplistic. It’s just, you run this program with this input. If your code path metric, or your code coverage metric increases because you did this, then consider that a good input. And then just keep going.
Yujan Shrestha: Got it. Okay. Cool.
Bimba Shrestha: I think there’s fancier stuff you can do that probably works a lot better. The fuzzer that comes bundled with clang, glib fuzzer, doesn’t have any of that, it has what I just told you, the very basic metric.
Yujan Shrestha: Got it.
Bimba Shrestha: So back to open SSL, so we’re going to try this line of code using fuzzing and it should do it relatively quickly, relatively quickly with respect to how long it took security researchers to find it, which is about two to three years. So let’s try that. So the first thing you need to do when fuzzing a library is to write something called a fuzz target. And a fuzz target is just the entry point to your fuzzing algorithm.
So if fuzzing were just poking a black box from the outside and seeing what breaks, then the fuzz target is the black box that runs, that does the actual work. So this is what we’re trying to break. This can be anything. Here I’ve just included the open SSL library, there’s some library boilerplate here, and then the interesting part is this line here, SSL do handshake. So all this is really doing, this is the black box that we’re trying to stress test. And all this is really doing is taking random data which the fuzzer will generate and then doing something with it. Here it’s just directly populating the payload with it, but you could use this as a seed to random number generator or something, or you could cut it up in different ways and like pick out unsigned integers from there. You could do anything. You just get this blob of data. You have to do something with it, the tests.
So in this case do handshake calls, utilizes the heartbeat protocol. So if there’s something wrong with heart beat, we should be able to find it just by executing the CPI call. So this is the only thing that you need when you’re running a fuzz target, you just need to write this one function called LLVM testing, one input. This is just the helper method, this is all you really need to get started with fuzzing. So let’s try to compile that. So I have the broken version of open SSL, which I downloaded there, I’ll just compile it. I just compiled it before the meeting so this will be pretty fast, but when you compile it, you want to compile it with clang, because clang has fuzzer built in and you’ll want to pass in the debug flag, and you’ll also want to pass in address sanitizer, so -fsanitize=address. So this is what I was saying before about how you could use fuzzing and sanitizers in a complementary fashion. No part of this is doing any fuzzing, this is just instrumenting the open SSL library with address sanitizer and getting it ready for fuzzing.
David Giese: Would it be possible to increase the font size of it?
Bimba Shrestha: Oh yeah, sorry, can you see it?
David Giese: Great, that’s great.
Bimba Shrestha: Sorry, nobody probably saw that. But basically, you compile it with clang, you pass the debug flag, and then you also pass in the address sanitizer flag, which is preparing lib SSL for fuzzing. So after you run that, you want to compile this fuzz target with clang. And I already have that.
So all this is doing is compiling our fuzz target, again with the G flag, the debug flag, it’s passing in address sanitizer and then this other option called fuzzer. This is what lib fuzzer, they just call it fuzzer, but back in algorithm it’s called lib fuzzer. So you pass in this flag, you add lib SSL in lib crypto libraries, include path, and then this executable that it outputs is what does the fuzzing, it’s just the infinite loop that keeps running, trying to randomly generate things until something breaks. So that’s basically it.
Once you have all that, you should just be able to run this new target that you created and then you should just be able to let this run until it finds interesting things. I’m not going to let it keep running because when I first let it run to find heart bleed, it took about two hours. We’re not going to wait two hours for that. So I’m just going to stop it.
But trust me, it will find it, in a couple of hours you’ll get something that looks like this and it tells you address sanitizer found a buffer overflow in address blah blah blah, on the heart beat open SSL method, on line 2586. If we go back to the library here, line 2586 is the mem copy, that’s where the buffer overflow happened. And it found that in two hours. If I had more cores on my IBM, it would probably find it in less than, like an hour, probably less than 30 minutes. But, you know, I think a lot of large companies just run this on clusters continuously. I know Google has a project called OSS Fuzz where they just take projects that are part of the global internet infrastructure, stuff like CURL and like G-ZIP, and all these integral libraries and they fuzz those in their giant cluster of virtual machines and every single day they find tons and tons of bugs and then they report the bugs to the maintainers. And then, they do pretty well.
The other thing about Google is that they also run fuzzers inside of their company, and I think I saw a stat that said that like now 80 or 70% of the bugs that are reported internally are just bugs found by fuzzers at least in CNC++ code, so these fuzzers are just churning out memory overflows, and overflows, stuff that developers would have a hard time finding. Some of it is probably noise, but like 70-80% is still, including noise, a very, very respectable number of bugs to find just automatically. Okay. That’s all I wanted to say here. Does anybody have questions about that example?
Matt Hayden: I have a question, when you ran this, did you find anything besides this?
Bimba Shrestha: I did not. This is the only thing I found. I think this is also after they’ve run LIB fuzzer on open SSL, at least the patch that I’m running. So this one is a pretty contrived example. I think they basically made it so that heart bleed could be found very quickly so they could show off the power of fuzzing, but then not much else. But like, if you, I know recently somebody gave a talk at the C++ Con where they fuzzed random libraries that had a certain amount of stars on GitHub and somebody fuzzed like JSON parson library and like this is very widely used and I’m sure used in the medical space and also very critical software spaces which is what makes it more concerning, they were just like littered with buffer overflows that are trivial to notice and also trivial to fix, but just weren’t caught up until that point. So it does work. In this case, I didn’t find anything because open SSL was probably well tested at this point with fuzzers.
Matt Hayden: The difficulty there is if you have 7,000 bugs, you have to kind of have a judgment call for each one to see if it’s exploitable, right?
Bimba Shrestha: Yeah.
Matt Hayden: That was my experience running like open SSL.
Bimba Shrestha: Yeah. If you’ve got a ton of bugs, it is definitely hard. So in my experience with fuzzers, the number of bugs that you can expect, unless your library is really bad, is probably in the order of, you know, if you run it continuously, maybe like one or two a day, something like that, it wouldn’t be like 7,000, unless you had a really, really bad library.
David Giese: I have a couple of questions. I don’t know if now is the right time, but if you take it to the pseudo code for the fuzzing, I wanted to just, so I guess my understanding is the, what’s it called, the assembly, the checker, so that’s what’s just telling you that hey, you found a bug, and then your code just runs random stuff, but there’s no assertions or, I guess, something…
Bimba Shrestha: Yeah. That’s a good point. You could add assertions, like you would in test cases here, most people just try, the assertion is just will the program crash, will it like siggent, or sigobort. But, yeah, you could put assertions in there too.
David Giese: I see. That makes sense.
Yujan Shrestha: So is it true that usually most of this fuzzing happens at like a really black box level, like it seems like you could go, you could take a white box approach where you fuzz every single function, it’s kind of like the unit test.
Bimba Shrestha: Yeah.
Yujan Shrestha: But that would require a lot of implementation, knowledge about what that function is.
Bimba Shrestha: Right.
Yujan Shrestha: Does most fuzzing happen on the, like the black box level?
Bimba Shrestha: It does. Yeah, it mostly happens on very high level EPI functions that do a whole bunch of different things. So it’s probably more similar to an integration test than it is to a unit test. Yujan Shrestha: Got it. Cool.
Bimba Shrestha: Yeah. So this is, now that fuzzing has become very popular, it’s just comic, but I think most people have seen where when somebody says get back to work, and the person says they can’t because my code is compiling, now I think there are plenty of people who just say that their code is fuzzing instead. I just put this on here because fuzzing is not obviously a silver bullet, you still have to have some like knowledge about your code base to be able to point things. You can’t just call a very simple EPI method and try to break like an addition function, which is totally pointless.
So you still have to have some knowledge about your code base to be able to use this really powerful tool well. And the other thing is there’s a whole bunch of stuff you can catch with unit test much, much faster than fuzzing. So this doesn’t mean you should just not write unit tests or integration tests. Fuzzing is just going to catch weird bugs that you would never think to test of, but bugs that you can’t think to test of, you should probably do that with unit tests because you’ll have a much clearer error message in case something breaks. You’re not going to have to like dig through this giant log of address sanitizer, stat trace, whatever, you’ll just get something really nice like this assert failed or something. So fuzzing is not the answer to everything. I think the last couple of things I wanted to mention is that fuzzing is used in large companies to find lots and lots of bugs.
There are tools that do fuzzing for languages other than C/C++, so instead of memory overflows, memory and like issues you find, and just logic bugs that maybe you have like assertions in somewhere, or just find code paths that are likely to trigger logic bugs. I’m not too well informed on how fuzzing happens in python and Java, but I do know companies are doing it and doing it well. OSS fuzz, the project by Google that I was telling you about, and then you don’t need to run fuzzing all the time, most of the fuzzing infrastructure that I’ve seen does run it all the time, so they’ll just have something connected to their continuous innovation pipeline where they have this running, fuzzing all the time and they tend to push changes, the binary gets updated and they just keep running the fuzzers and in OSS fuzz and other tools actually do something even beyond that, they find a crappy input, they find a test case that results in a crash, they’ll go a step further and try to minimize the input to it’s smallest possible crashable input. So they don’t want to give you this massive sequence of bytes, like yeah, this is the cause of the bug, but when you test it, it actually takes you five hours to run it. They’ll try to make it much more manageable. That’s one thing people do. Yeah, I think that’s all the stuff I wanted to cover. So fuzzing is not a silver bullet. It is usually very expensive because you’ll have to run it on a cluster.
But stuff like address sanitizer and memory sanitizer, and sanitizer, that stuff is kind of the first step to fuzzing so I think at the very least that most people can do if they are writing CNC++ code is run their unit tests with the sanitizers enable because chances are you’ll probably find something unless you’ve been very careful. And the next step after that would be to just run fuzzers for a little while on some simple easy to black box API functions. And you don’t have to run it on a cluster, you can just run it on your machine for a couple of hours, or put it in some CI pipeline that runs it for a little while. It doesn’t have to be terribly expensive to be effective. So, yeah, that’s basically it. If anyone has any questions about anything we can go back and answer them.
Reece Stevens: This was a great presentation. Thank you for going through this. I really enjoyed it.
Bimba Shrestha: Good. Awesome. Thanks, I appreciate it.
Reece Stevens: So I do have one question. So, obviously, you mentioned several times that memory unsafe languages, so, you know, for instance, with Rust a lot of those perimeters that are normally being tested by fuzzing are…
Bimba Shrestha: Yeah.
Reece Stevens: They’re not impossible, obviously, you can always have unsafe code, but they’re trying to be designed out, but do you think that fuzzing still provides value even in languages that have those safeguards in place?
Bimba Shrestha: I think it provides the same value it provides to like a language like python or Java, where it’s much less likely to encounter memory bugs. Like logic bugs can still be found using fuzzing where you can just, it’s just a way of generating unit tests. It doesn’t necessarily have to result in a buffer overflow or something like that. I don’t know if it would be as useful. You probably wouldn’t be as effective in finding bugs at the same volume as if when you’re running it on a C library like open SSL, but I think my intuition says you probably will be able to find something. But, yeah, in many ways Rust is really the answer to a lot of these problems.
Dillon Williams: It seems like putting in a bunch of assertions or exceptions from a convariance would make fuzzing on memory safe code more likely to find.
Bimba Shrestha: Yeah.
Dillon Williams: Like if you’re writing, there’s no way that could happen, but just in case…
Bimba Shrestha: Yeah…
Dillon Williams: But then maybe later on it turns out that fuzzing finds some way…
Bimba Shrestha: Yeah, I think that is what I have in my mind too, when running on memory safe languages. That’s a good question, though. I think there’s some work being done on Rust, I’m not really too sure, but I think like even in the worst case because this is all LLVM based, you could find some way to run it on most languages that use LLVM in some way. So like I’m sure Rust has something.
David Giese: Would like the idea of putting assertions or varients – I know another type of bug that I’ve run into on some of our projects that involve processing data usually from cameras and so on, is divide by zero bugs or like bugs where like you have some sort of data analysis pipeline and you’re probably testing with a handful of data sets and like certain numerical operations just can’t happen that way but you don’t necessarily think like oh yeah when it’s these two things happen to be the same and it divides out, I don’t know.
Bimba Shrestha: Yeah, it would probably be pretty useful there too. I think with all of these, the like fuzzing definitely has a cost in terms of just time it takes to just find stuff. It’s pretty easy to build out, to integrate fuzzing into your code base, but it is work. So like in some cases, it just might not be worth the effort to do this kind of stress testing. I think it’s also worth keeping in mind, but generally it’s pretty easy to integrate.
Emma Curtis: I really enjoyed this presentation and I was also thinking about the way that fuzzing might be used on like on mobile apps, especially with android beings using CNC++ and I’m curious, I haven’t really read into it much, but I’m curious if you saw anything about some of the vulnerabilities that fuzzing might catch in a mobile app.
Bimba Shrestha: To be honest, I’m not really sure. I know like, I’m very uninformed of like android mobile development but the, I think Java has a pretty popular fuzzer that does some stuff. But even like stuff like open SSL, like those libraries are getting called from everywhere. I know they’re getting called from, they have like Java, wrappers, they call them python wrappers they call them. So vulnerabilities found in CNC++ code that are used everywhere would be useful like even with mobile devices, I’d imagine. But, yeah, I think that’s, I don’t really know too much about the mobile spaces platform.
Emma Curtis: Thanks, yeah, it was a pretty vague question, but thank you for the answer.
Bimba Shrestha: Thanks.
Yujan Shrestha: Yeah, definitely thanks a lot for the presentation. I learned a lot from this. I think some of the practical takeaways for me is like in 6304, you want to kind of divide out your software system into various software units and one of the main reasons for doing this is you can assign a different risk category to these software units, class A, B and C, and the type of testing you do and the type of code review, and that kind of thing, could be different for each of these software units. I’m thinking like anything that’s software level C, which you know like these things could include things that are life critical, like for example the bug that you found with the infusion pump, there’s probably some critical path there that perhaps if a software unit was identified as being class C, that you would not only on top of unit tests and all the other testing, but also run fuzzing on that as well.
Bimba Shrestha: Yeah, I think that’s a good takeaway, knowing how important like reliability and like knowing where stress testing matters is good stuff, to check first before doing anything else.
David Giese: One thing that jumps into my mind is like if you think about this data analysis pipeline issue again, I would imagine that if you are fuzzing where your input is an entire image that, like the probability of, it would take way longer just because the dimensionality of your input is so much larger to run into a bug. So like figuring out where to like, what part of your program to like set the input at could increase the likelihood of, yeah…
Bimba Shrestha: Yeah, it’s definitely tricky. The most common use case for fuzzers, not surprisingly are stuff that’s very easy to black box but also not extremely high dimensional, like you’re not processing like volumetric images, you know, stuff like compression algorithms like parsers. Parsers are probably what are used the most because there’s tons of bugs to be found in parsers. But the input doesn’t have to be that complicated for you to trigger a bug. But I have seen something with some image processing library that uses fuzzing. I forgot what’s its called, but I think it’s out there somewhere. Yeah, but modeling, so I think this is the other point about how you can’t just throw fuzzing at everything and make it work. There’s a decent amount of work that you have to do ahead of time to model what it is that you want to fuzz, and then like in some ways that’s the hardest part if you have a problem like the image one.
Reece Stevens: I’m curious, you know, if fuzzing could also be used to discover bugs in machine learning models. You know, like there’s all these different bugs where you alter one pixel image, suddenly it goes from a cat to a hot dog or something, you know those kinds of things on a systematic corpus based fuzzing approach to kind of detect those kinds of conditions more rapidly.
Bimba Shrestha: Yeah, I bet you could use the same like genetic algorithm mutating of input, that type of algorithm to you find something. Yeah, I feel that would be like a useful case, like just outside of finding crashes and like or tensor flow, just finding stuff that makes you get weird, if you predict weird things would be interesting.
Reece Stevens: Yeah.
Matt Hayden: Like a confidence step, you know.
Bimba Shrestha: Yeah.
Matt Hayden: You have a pixel and say this is an 89%, 89% of this ensemble comes out hot dog, this is 89% hot dog. When you mentioned monkey fuzzing, I guess I hadn’t heard that term before. I wonder if that’s related to the chaos monkey project that came out about the same time as fuzzing. This is kind of like similar to where you decide to disable a noad on a network and see if your whole network has a problem functioning.
Bimba Shrestha: Uh…
Matt Hayden: So it’s kind of like a distributed systems approach to fuzzing kind of live.
Bimba Shrestha: Yeah, could be, is it around. I don’t actually, I think, when did fuzzing come out, was it in the ‘80’s or something? Is that around when chaos monkey…
Matt Hayden: Oh, I didn’t know it was the ‘80’s, wow.
Bimba Shrestha: I think so, I could be wrong. I think this was like, I think there was a paper that came out in the ‘80’s and no one touched fuzzing for like 20 years. So really, it probably started much later. But, yeah, I guess it could be, the monkey term from that.
Yujan Shrestha: That’s great. Isn’t this picture wonderful.
Bimba Shrestha: Okay, yeah, I think that’s it then. I can, sorry, I put a link to the slides, there are, there’s links, there’s source links down at some of the slides, C++, if you look at the talks that have been given in the last couple of years, a lot of them are about fuzzing, so fuzzing is definitely increasing in popularity in the C++ community. Hence, you know, so they just talked about in CC con. But there are references here and this Github, in particular, has a whole bunch of other stuff, cool stuff too. There’s also another, yeah, so there’s this long one, this is in powerpoint that some guy did about fuzzing which has a whole bunch of other stuff that I didn’t go into. There’s other types of fuzzing. They also go into lib fuzzer and what exactly it does, AFL, which is another type of fuzzer, but it does, in case anyone else wants to dig into those.
Yujan Shrestha: I just have a quick question about the slide that we’re on, so on the right, the for the coverage guided, is it true that the two steps that require kind of like manual fine tuning upfront, is to choose random input and the mutate input?
Bimba Shrestha: And so, yes, the random input one. The mutate, you, there’s only manual intervention that’s necessary to pick your fuzzer.
Yujan Shrestha: Oh, okay.
Bimba Shrestha: With lib fuzzer, that’s kind of decided for you.
Yujan Shrestha: Okay.
Bimba Shrestha: But, yeah, random input, that’s definitely the hard part because you might have an EPI that’s wonky and hard to test with just a raw sequence of bytes, so you’ll have to do something, you’ll have to like cut up the bytes somehow to make it meaningful for passing into your function.
Yujan Shrestha: Got it. And are there tools that can help you form that corpus, like maybe there’s some run time instrumentation that…
Bimba Shrestha: Uh, I think so. I think you can, I think, with clang’s lib fuzzer, you can actually just like give it some noise or random data and it will try, like once it finds a bug it will, I don’t know exactly how it works, but it does give you that, an output directory full of random sequences bytes and you could use that as a corpus. Most of the time, I think the recommendation is to manually craft it so that where you start off it’s like a solid base.
Yujan Shrestha: Got it. It’s kind of a tricky situation ‘cause you’re trying to find something that’s unknown.
Bimba Shrestha: Right.
Yujan Shrestha: But you need to kind of read between the lines in order to find that.
Bimba Shrestha: Yeah.
Yujan Shrestha: And if you knew enough about it, you’d just run a unit test, like if you knew there was an error condition, you’d just run a unit test.
Bimba Shrestha: It’s more like just pointing it in a vague direction, like with, for example, with the JSON parser, you don’t want to have binary blogs as your corpus you have. You want to have, you know, characters and stuff that the regular JSON file would because it wouldn’t make sense to just test a JSON parser with an SML file or something. So just like narrowing the step, yeah, you could definitely point it in the wrong direction if you just have a corpus that doesn’t have a wide enough scope or something.
David Giese: So in the case of a JSON, would you start with a corpus where like maybe you would have a number of valid JSON files?
Bimba Shrestha: Yeah, so the corpus doesn’t have to be stuff that crashes your program. It just has to be someplace to start.
David Giese: I see, so then it will mutate it, execute it, if the coverage increases, so like maybe like it just added a letter to one key. It will say, oh okay, you know, like and this made the software do something different, so like by kind of giving it that starting point it…
Bimba Shrestha: Yeah.
David Giese: I see.
Bimba Shrestha: Yeah. It’s giving it some sort of structure.
David Giese: Yeah. It’s a pretty cool deal.
Yujan Shrestha: Can it get, like, can it get, so say that like with the JSON parser, you had to delete two quotes in order to go from what was a strain to now it’s an integer. Like, could it, like this is kind of like an optimization problem, I guess it could get stuck in some sort of local optima.
Bimba Shrestha: Yeah.
Yujan Shrestha: Does that happen with these input mutators?
Bimba Shrestha: Honestly, I don’t know enough about the mutators and how they work to properly answer that, but there’s probably some work around is what my intuition says. But, no, I don’t know.
Yujan Shrestha: Cool, yeah, it’s just very interesting, very interesting stuff.
Bimba Shrestha: Yeah, I think so too. Okay, yeah, I think that’s it. Thanks for listening and for the questions.
David Giese: Great! Thanks so much for this, you know, this is great.
Matt Hayden: Thanks, Bimba.
Bimba Shrestha: Sure thing.