Episode 40

Data Science in Cybersecurity with Kevin Hanes and Jon Ramsey

EPISODE SUMMARY

This week on 401 Access Denied, we discuss data science and automation in cybersecurity. Joseph Carson is joined by Cybrary CEO Kevin Hanes and Jon Ramsey to discuss everything from AI and Deep Learning to different classes of approaches including model-based threat intelligence and threat actor attribution.

Subscribe or listen now:

Meet the Podcaster
Full Transcript

Joseph Carson

Joseph is Chief Security Scientist and Advisory CISO at Delinea, an active member of the cybersecurity community, and a frequent speaker at cybersecurity events globally. He has 25+ years’ experience in Enterprise Security & Infrastructure and is a Certified Information Systems Security Professional (CISSP). Joe is also an adviser to several governments and cybersecurity conferences. (ISC)² Information Security Leadership Award (ISLA:registered®) Americas Winner 2018.

Joseph Carson:
Hello, everyone. Welcome back to another episode of 401 Access Denied. I'm really excited about today's episode, because we've got another exciting topic for you. And before I get into the details of the topic, I want first of all, my name is Joseph Carson and I'm the chief security scientist and advisory CSO at Thycotic based in Tallin, Estonia. And I'm joined with two fantastic esteem guests for the show today. So first of all, I'm going to pass it over to Kevin. Kevin, can you tell us about who you are and a little bit about your background?

Kevin Hanes:
Thanks, Joe. Well, so Kevin Hanes, this is my second month here at Cybrary. I joined on as the CEO. Super exciting, also super exhausted at this point. I spent the last eight years at company called Secureworks, which is a large MSSP here in the US and last four or five years of that, I was a COO. So lived a lot of the pain points that Cybrary is pointing at, and I'm really excited to be here. Thank you for the intro.

Joseph Carson:
Fantastic. Thanks Kevin. And also we're joined by Jon. So Jon, can you give us a little bit about who you are and a bit about your background?

Jon Ramsey:
Yeah, sure. Hello everyone. My name is Jon Ramsey. I'm really grateful to be here today and have this conversation. I think it's an important one. Up until February, I was the chief technology officer at Secureworks where Kevin and I met, for about 21 years, where was on... My whole entire professional career has been in security one way or another. While I was at Secureworks, I also had a role as a senior Dell technologies fellow. So I got to go and see the advances and innovations that Dell technologies and the strategically aligned businesses were doing, which was a lot of fun. Built out a pretty interesting Machine Learning system in Secureworks, actually a couple of them, to solve a lot of the problems that we have in cyber. Prior to that, I was at the computer emergency response team at Carnegie Mellon University, where I did computer network exploitation, computer network defense. And while I was there, I studied software engineering at CMU to try to understand why so many vulnerabilities exist. And then prior to that, I was at Siemens Corporate Research. And so thank you for having me. I'm excited to be here.

Joseph Carson:
Fantastic. So this is going to be exciting discussion today. Between the three of us, we have almost quite a wealth of knowledge and experience behind us. And what I've seen is, at your backgrounds, we've come probably not directly introduced to each other, but we've followed the same path in the industry. So really excited kind of where we'll see what this conversation goes. So today's topic for the audience is a really exciting one because sometimes we segue into different areas, such as what we see... The cybersecurity industry, it's no longer just about technology. It's no longer just specifically about security, because it segues into so many businesses in different ways. And one of the biggest areas as well is we have a major skill shortage. We have a major area of... Basically we're shortage in people filling the jobs and organizations are opening more positions than we actually have people that actually fulfill them.

Joseph Carson:
So it caused a major challenge and we always look for ways of how can we automate a lot of things? And I've seen a lot of areas such as social backgrounds, people with social science skills, coming into the industry to really help bridge that gap between the technology and the human side of things. And that's also seen a lot of people coming, even from the psychology background, to try and make sure that we're building things that actually people can use, that people can actually understand. So cybersecurity is kind of opening up into many different new industries and today's topic is all about the area of data science. And this is going to be an area... And specifically data science in a security context because we've seen where people coming in that we need to... I've always been in the industry where we need to automate things and to automate, we need data. We need good analytics. We need good algorithms. So this is where we really help understand about the direction and path we're going.

Joseph Carson:
So I'd like to kind of... Kevin or Jon, whichever of you wants to take this first. Can you give the audience just a bit of an overview into what does data science, the background, what's the state of where we are today and also how important is it for security industry?

Kevin Hanes:
Jon, you want to kick that off?

Jon Ramsey:
Yeah, sure. I think, like Joe said, the security, the disciplines that are coming into security are increasing and we need it. I think that's really, really helpful. And of course, data sciences, every other industry is being digitally transformed, we should be digitally transforming the way we approach cyber. And so first maybe if we take a step back and think about what is data sciences? There's a lot of ambiguity around the term of data sciences. So maybe if we can bring some clarity to that, then we can think about how data sciences could help us defend to be better defenders. And so data sciences is a broad term. It generally refers to the class of approaches you use to making inferences or learning effectively from data and can be broken down from traditional big data with large searches as a form of data sciences to, what's probably more recent in terms of what evolution in that, is it really the area of Artificial Intelligence and then underneath Artificial Intelligence is Machine Learning and then underneath Machine Learning is Deep Learning.

Jon Ramsey:
And so sometimes companies will say, "Were you using data sciences?" And they mean, we're doing maybe statistical anomaly detection as a form of data sciences, but that isn't necessarily Machine Learning. Underneath that in the Machine Learning world, in the Deep Learning world, there's really the way to think about it, there's three components that consist of that. And if someone says they're doing AI, ML or DL, you ask them, "What are the three components?" And the three components are, what are your models look like? So this is a right... Some representation of the real worlds, some simplification of the real world, some kind of state based model, for example. What is the question that you're trying to infer with respect to the model and how are you learning? What is the learning approach taken from the data? So if you have those three piece parts modeling, inference, and learning, effectively what you have is a form of Artificial Intelligence or Machine Learning.

Jon Ramsey:
There's lots of different approaches here. And quite frankly, there's a lot of math that exists underneath it. There's predicate calculus, there's linear algebra, there's probability theory. So the discipline that data sciences is bringing into the cyber realm is really specifically around those disciplines in terms of linear algebra, predicate calculus, and probability theory.

Joseph Carson:
Anything you want to add to that, Kevin?

Kevin Hanes:
Yeah. From my perspective sitting in my chair my last year, it really boiled down to how do we bring more automation, the word you said. At the end of the day, I'm a 100% sure we could not have done our jobs without having some support and assist from the technologies, the data sciences technologies, that Jon and our team bill. There just were not enough people with the skills to do that. So we were talking on the order of billions of security events a day going through our systems and you just can't staff that.

Kevin Hanes:
So you got to a way to do that. The other thing, I'd just say, with the new folks coming in, it's always interesting at what point they realize, just talking to them as they kind of come into the company or get into the field, very excited to solve these big data problems and help the world, at what point they realize that there's actually an adversary on the other end of the keyboard who's trying to undo everything that they're doing. And it's kind of interesting to see that light bulb come on because I think in a lot of other fields, it's not necessarily the case. Not necessarily the case that there's someone actively working against them. So that's interesting to see when that happens and they realize it.

Joseph Carson:
And it's always about staying ahead of the curve as well. You always have to try and keep ahead in the defenders versus attackers, that in the defenders side, we have to be successful 100% all the time. It only takes the attacker to be successful once. They only need to find one key to get in the door. So sometimes it's always a challenge. And I always feel that the defenders don't get enough, let's say, reward or visibility or support about the job that sometimes goes unseen in the background. The only time you hear about the bad things is when an incident happens, but you don't hear about the 300 other days of the year where they've been actually working keeping the company safe.

Joseph Carson:
So sometimes it's a bit of all this. They're always working the background sometimes doing amazing work hidden, and a lot of times goes unrewarded. One of the things kind of... So just kind of on the point of what Jon was mentioning, so for me when I look at the industry and I do research across the board into a lot of companies who are doing a lot of machine... I think Machine Learning is advancing the quickest. For me, when I look at the Artificial Intelligence side, I'm still seeing that right now what we're really doing, especially in the cybersecurity, when it comes to artificial, it's advanced much more in other industries than it has in the cybersecurity. But in cybersecurity, I find it's much more still into advanced automation where we're taking the data, we're doing a lot more automation, but getting into where it's self learning and self healing and self perpetuating and taking the new, let's say, threats that's been discovered and actually then creating defenses to defend against that.

Joseph Carson:
I think that's where we start seeing a little bit more trend towards a real Artificial Intelligence into the security realm. But I don't see that we're quite there yet. Everything I look at, it seems to be more on the automation side. Can you tell me what... Maybe I've got kind of different visibility, can you tell me what you've seen, what your interpretation of the Artificial Intelligence that we have in the security industry today?

Kevin Hanes:
You want to start?

Jon Ramsey:
I think I'll jump in on that. I think a couple things about positioning Artificial Intelligence and Machine Learning in the cybersecurity space, which is number one, countermeasure development when you learn about a threat... Well, let's step back. I think diversity is a key characteristic of survivability and sure, the threat only has to be right once, but I don't really believe that's true. Sure, to get in they have to be right once, but they have to be right again to get their data out. They have to be right again to the whole kill chain. They have to be right along that process. And so where I think what we need to detect the threat is we need diversity. And that diversity includes things that have... Signatures are not dead.

Jon Ramsey:
They'll never be dead. When we have an indicator of a threat, we should use it and we should use it against the threat and we should use it everywhere. The question becomes, "What do you do when you don't have an indicator of a threat?" Well, then that's where you start applying instead of these representational techniques, like the indicators and signatures and hashes and domain names, you start applying more functional areas for detection. So just probability as a means of anomaly detection. So we don't know if it's... We haven't seen this before. It's probabilistically unique. Maybe then it could be malicious. You don't start with that, but you use that to build context to now all of a sudden building out attack graphs to understand where a threat might be and where they might go to drive sort of the defense process.

Jon Ramsey:
If you use this diversity of representational and functional techniques to be able to detect threat, then now you have increased the amount of sophistication in that threat in the time almost exponentially, because they have to be thinking about, "How do I defeat these multiple types of systems instead of one type of system?" So that's number one. I also think when it comes to AI and ML, the approach I would take when I step back when we come at the problems, the very first thing we need to do is scope the problem to be as specific as you possibly can. There's this term in the ML world it's called no free lunch. You're not going to say, "Here's the problem. Here's the data. Here's the algorithm," and you're going to get results with high efficacy.

Jon Ramsey:
So really thinking about the scope of the problem, and there are lots of problems that exist in the cyber space that AI and ML can solve. Just for an example, detection of malicious activity, which we just talked about, or dynamic threat actor discovery. Let's use the techniques that are used in social networks to build social networks of threat actors based on shared attributes of TTPs, as an example, or even threat actor attribution inference, building threat actor sets. Today, all those threats are sort of put together because an incident response team walks in and goes, "Well, this looks like a group. This looks like a group we haven't seen before. Let's create a new group." And the catalog of threat actor sets is in the hundreds, they're probably 10,000 of different threat actor intrusion sets, also inferred relations among events themselves.

Jon Ramsey:
Like today, correlation is sort of done by rule based. But what if we learn that these two events have properties that support the correlation of those events and then threat intelligence overall, I believe we're seeing an evolution of threat intelligence, or the data that is represented via threat intelligence, moving to more of what a data scientist would call a model, a representation of the world that is much more prescriptive. It's much more mathematically consistent. And so you can reason about it, which that is really important. And so model-based threat intelligence is how we need to represent what's happening from a threat perspective. So you roll all of that together and you can see that there are many use cases that exist for Machine Learning and data sciences in the cyber domain. Or here, I love this one. There's this whole space called generative adversarial networks.

Jon Ramsey:
And it's really two algorithms competing with one another. I generate an image and when I generate that image, I put it up against another algorithm to see if that algorithm can detect the image. Why don't we use something like generative adversarial networks to test our countermeasures and efficacy of those countermeasures and build a system that tries to beat the system? That space is also very interesting. So it's just a really exciting time in cyber when it comes to taking these kinds of approaches and being able to move the needle from a defensive perspective.

Kevin Hanes:
And Joe, I think my perspective on this, which is I would agree with how you position that there's a lot more, I think, to come in terms of things like self healing and sort of taking the next steps here. But I do feel like today, a lot of it is automated, it's creating, it's all great. It's creating automation, it's helping. At the end of the day, we're still putting a lot on the shoulders of men and women in these roles to make pretty tough decisions about a lot of different things with a lot of different nuance in a very short time. And so I hopefully we continue to kind of make progress make those jobs a little easier.

Joseph Carson:
Absolutely. And that's kind of... I've been in the industry now getting close to 30 years and every time I remember that I always run out of time and you never had enough time in the day to do your job. I was a domain administrator for a 100,000 servers. So what did I do? I automate. If I had to do something two, three times, what I did was I created the script. I got something at a batch job. I did a scheduled task. I put something in a Perl script. So I looked at ways in order to scale and that's what we've done over the years in order... We've seen organizations where you maybe had one or two people managing 100 machines, 100 employees, and you have two people doing that.

Joseph Carson:
Today, we've got basically organizations who have two people managing 20,000 and a lot of that as well is also balance. So they can't do everything themselves and they can't be experts of everything. So what they've done as well, they outsourced where they possibly can in order to continue managing that. They went to service providers. So I think for many organizations, I don't think a lot of SMBs or even medium size companies who get to that type of threat intelligence. So I think a lot of that will most likely come from service providers and platforms, so they will have to plug into that. So I think definitely that's a direction, but automation, I think, is the key part here because we have that shortage in order to close that gap automation and using data analytics.

Joseph Carson:
I remember working in the early 2000s about even detecting things like memory failures and hard disk running out, that what we would do is we'd take the last six months or last year of data and you'll see basically memory increasing. And what we could end up doing is try to reverse that data and put it into the future. And that was great, because that's when SQL reporting and analytical services started being able to do a lot more things for you in order, not just about looking in the past, but also getting into predictability. And we were able to then say roughly, "In November this hard disk is going to run out of space. We need to do something before it runs out because worst case adding space before it runs out will stop you getting data corruption. If you wait till the disk runs out, you're going to end up with corruption."

Joseph Carson:
So that's kind of where a lot of those trends and that's for me, I really kind of got even just into the little, just getting paddling in the little streams of data science. I've never delved right into it, but I work a lot in the industry with people who's coming and they're doing some amazing things. I watched here in Estonia, where there was a data Linux that was done in, for example, just to give you example use case, is a traffic light. So what happened was, there was a specific traffic light where there was a lot of accidents happening. Just a lot of accidents were happening at this specific location. So what they started doing was they started doing some trial runs and they got basically some glasses that detected eye movements. They wanted to see where people's eyes were when they were going through that intersection.

Joseph Carson:
And ultimately what they did was after doing a lot of trials day night and so forth, the data that came back from basically the glasses and they took that data that they analyzed that, they put it into the surrounding view. And ultimately what happened was that people's eyes were going automatically to an advertisement board and there was the advertisement board and ultimately the root cause was the light was too bright. They dimmed it down. They turned down the light on the advertisement board because they were able to take all of that eye movement, all of that data, and understand about what was causing the issue. When they actually turned the light down and dimmed it down and it was also kind of different times of day, because in Estonia, we have in the summer the light is very bright and low. And winter's very dark a lot of times of day.

Joseph Carson:
So they adjusted it and afterwards the number of accidents that that intersection reduced down. And I think that's where, when I look at those types of areas where they're using data in order to look at how to understand about its correlation and context, I see that this is where we really need to be bringing it in. And a lot of things that Jon mentioned around threat intelligence, I think threat intelligence is a great word, area, where if I know somebody else has been hit first and that they're able to share that information with the community beforehand, we can raise our defenses. We can raise that... Security needs to be, I will say, it's like almost like a living organism. It should not be static. And that's the problem we have is today's security is very static. It's rule based, it's signature based, it's policy based.

Joseph Carson:
And what happens is it's not very dynamic to the threat that changes in the world. All the of sudden when we get a new variant of ransomware, our new variant of malware that's out there, all the vendors have to re-update their signatures, they have to scan it. They also have to know that it exists and therefore that's where we start. We had that very static process. And for me, getting data into our industry, getting data to really start to evolve will allow security to really start becoming a living organism where it can be very dynamic and adaptive and grow. And when the threats out there are high, all of a sudden when we start seeing the number of companies that becomes victims of some type of variant, we can turn our defenses. We can hit that threat level up all of a sudden. That means we can, for example, go from where it's maybe certain privileges to least privilege or to Zero Trust.

Joseph Carson:
All those things can start to be enhancing the security controls. So that's where I'm really excited about when I get into data and looking at how we can evolve the industry. Primarily the current is automation, but in the future getting security where it's adaptive and basically where can change based on the threats out there and your threat controls can go up and down depending on what's happening. I think that's really where we can leverage the data. Also looking ahead as well, looking potentially... When I talked about the hard disk and the memory failures, if you can take that historical data and start putting it ahead, also would give us indications about the evolutions and the ways that threats will come. So it's just interesting. Jon, if you have any thoughts around that or any comments in that area?

Jon Ramsey:
I have a lot of thoughts. Let me start with a good strategy is especially when there's such a skill shortages, have people only do things people need to do. In other words, if a machine can do it, have a machine do it.

Joseph Carson:
Absolutely.

Jon Ramsey:
So maybe if you take that strategy and apply it to the Machine Learning world, you could have a system that's looking at the events coming in and that system might rate them with, say, a probability of the event being malicious with some confidence score. You think of it as tri-state logic. True, it's malicious. False, it's not malicious, or I don't know, the other state. And so if you have a system that does that, you're going to want to get the system to be able to rate as many things as it can.

Jon Ramsey:
You want the system to have ultimate confidence in what it's saying about the event. So in that case, if it's 100%, if the system says, "I'm 100% confident that it's malicious," then you don't need a human to look at. If the system says, "I'm 0% confident that it's malicious," then you need the human to look at it. And then once the human looks at that, the expert looks at that and they label it with something, that goes into the training set. Then that training set informs the next time something similar comes in and then the confidence begins to and increase as humans begin to label it. And that's really, really important, the ability to capture what someone says so that a system can learn it. But there is also the-

Joseph Carson:
I have a question before you move on to the next bit just on the topic, is that I completely agree with that. Absolutely. If something comes back and it tells me that there's a flag here right now in my mind tells me that having human look at it. It comes down to, I think, when I decide to automate the action or not, really depends on what that action is. For example, if it's safety related, you have to be very, very, 100% certain. Doors open, let's say, machines reboots rollback to previous snapshots or previous versions, depending on what that action is, do you think all actions should be automatically... If it's...

Jon Ramsey:
It's a good question. And I think there's an important nuance here. And the nuance is there's a difference between automation and autonomy. Autonomy is the system making the decision by itself to do something that something might be done automatically. And so, for example, one of the things you might want to do is isolate a machine as an action. And you can automate the process of isolating a machine. You hit a button and the system goes and blocks the IP address at the entry point, wherever that may be. That's automation. Autonomy is the system, without human input, deciding to automate the machine. And we're not in the cyberspace, we're not anywhere near automation, near autonomy. And you can see that because you could see organizations that have implemented source systems and the source system is really more about automation, not autonomy.

Jon Ramsey:
So for example, it goes out and collects all the contextual data you need from an IP address or an executable or domain name, what happens, but you don't see those systems actually going out and most organizations deciding on their own, autonomously, to block that hash. And the reason is because whenever you take an action in a system that very action itself represents a risk, and you don't know if you're decreasing the risk or increasing the risk. So this is also a place that data sciences Machine Learning can really help. And the space is called multi-objective optimizers. So for autonomy, if you think about autonomous vehicles, there are multiple objectives. The first objective is keep the passenger and the driver alive. The second objective is, keep anyone else outside the car alive. The third objective is, maybe get to the destination as fuel efficient as possible.

Jon Ramsey:
And the fourth objective could be get to the destination as pleasurable as possible in terms of the route, take a route that's more scenic as an example. So here are all these objectives. How do you make a decision in the moment given that set of objectives? That's the autonomy of it. So in our world we might say, "Don't take the system down, keep the system up and running. Don't let the threat take the system down and make a response that is very contained to that system and give them those objectives." The cost of the response should be minimal, as an example, and list all those objectives out and then have the system tell you, or have the system execute that particular action. And I think we need that. And that's to your point around adaptation, that's real. Adaptation, you have to change something to adapt. The problem with doing autonomous adaptation is we don't know if we're introducing more risk than we're preventing. And so being able to model that risk, calculate that risk, is where data sciences can really help.

Joseph Carson:
Absolutely. No, that's a great point. And it reminds me, I did in the past, I actually worked in the maritime industry. So I actually was working in a lot of the first autonomous shipping. So that's where a lot of my experience comes into that data analytics and looking at safety systems and having multiple and to make sure that you had as minimal risk as possible. I just want to bring Kevin back in. One of the things I'd like to get from Kevin and specifically around even what Cybrary does in the platform from learning and education. And I was just interested about data science skills and skill sets and how we're training them and how we bring them into the industry and where are they coming from? So just kind of get an understanding about what skills that we need? How do we get more into the industry so we can automate and do the things myself and Jon's talking about? How do we get there? Can you tell me a bit about what's needed?

Kevin Hanes:
Well, sure. And I'll circle back to your main point, but just a tangential point I think is important, which is if I were to go back in time and say, "Hey what is the roles in a security program look like?" And if you were to write them down and kind of put the numbers of people you had, my guess is it probably end up looking like a pyramid. So you had a lot of level one people in your program, you had less, but still a good number, of level twos, less level threes, et cetera, up the stack to your pyramid. And one of the things that's, I think, kind of interesting is as we're getting more data science, more Machine Learning, which is then, to your point, leading to more automation, better capabilities to sort out signal and noise.

Kevin Hanes:
I think what's happening is an interesting phenomenon, which is there's less, I think, the pressure point eight years ago or whatever, when I started Secureworks, is all about the level one analysts, getting a level one analyst. Never had enough of them, could never find enough. And then how do you keep them retained and interested in that job? Now it's, I think, a lot of the technology that automation that you talk about is doing that level one type. And so now the question is like, "Hey, I need a lot more level twos. The work that's actually ending up with the person with the keyboard, is you more like level two work." And so it's like, "Well, where do I get those if I don't have level ones?"

Kevin Hanes:
That used to be the way we did it. We would get level ones and they would learn and they would grow there and then move into the next step. But it's an interesting thing about, "Okay, well, how do we create all these level twos when we're not really creating level ones?" So that's one of the things Cybrary's thinking about a lot, which is not only how do we get people into the many, many hopefully millions more people into the space to help, but how do we get them with the right skills to not only get the job, but to do the job well and to progress up to reach their full potential?

Kevin Hanes:
That's one point. The other you're you're thinking about, which is... I think a lot about this. I don't know if we have the answer yet, but I'm thinking a lot about it with the team, which is some point how do the people in the security program actually interpret this stuff? How do they know how the machine work works because I've been on a lot of calls with customers over the years where they have some pretty basic questions about, "Why did you do this or not do that?" And when you hear an analyst try to explain how the system works, you realize that, wow, there's a lot of complexity in why did something happen or not happen in that whole kind of chain of things.

Kevin Hanes:
So I think that there's probably a first step, which is how do we teach the people about how the machine works? The whole idea of sort of how things go through the is it suspicious, is it malicious, maybe yes or no, the whole way that training sets get built and that kind of thing, so they really understand and can think about how the system works, because I think that critical thinking process is really important. So I'd start there and then the second step to that is as people advance in their careers through the security journey, I think then how do they start to contribute back into the machine?

Kevin Hanes:
So in the beginning part, they're working in the machine. I think as their careers advance, it's working on the machine. And so how do they start to take what they know and actually build that back into the system to make us all smarter and better? So those are definitely things where the thinking about how to enable more tier twos without the benefit of having this giant pull tier ones. And then how do we help people understand how the machine works and then begin to contribute to it?

Joseph Carson:
Absolutely. And one of my concerns is the same as well, is that when I look at how we moved across, how we even in our industries can progress in our careers. We started off on those front lines, we started off on the front lines of either it was developers. We started off the front lines of support, we started frontline of sales and marketing, whatever area we came into the industry, and we progressed through those ranks. But when the first front ranks are being replaced with bots and automations and help desks that are automated, then how do we replace that missing? I think that might be actually, to your point, is fueling our gap is those entry point positions are becoming less and less available where we maybe should have thought about more about automating the more complicated side of things first and still had that flux of people coming in.

Joseph Carson:
And when I think about even our current environment with basically the way that the pandemic has also made a lot of people working, my traditional way of learning would've been in conferences and networking and meeting people and doing workshops where platforms online, like Cybrary's now become so critical to people's continuous learning because we no longer have those conferences and events and workshops that were happening anymore. So we still had to fuel that education and knowledge, not just for the new people coming in, but also the existing people to continually getting that skill sets and knowledge. So I think those are also critically important. One of the things I'd like to move on to Jon as well, another question I've got, and then we're kind of start getting into summary and closing up shortly, is that what about getting buy in from the executive board of management into these areas?

Joseph Carson:
Because I think for me, that's always been the challenge is how do you get to them invest in this? Because people and executives and organizations that their business is not technology, let's say their business is manufacturing or transportation or in communications or whatever it might be, or even in healthcare, how do we get them to buy into doing the right things when it comes to, let's say, even security especially in data science as well. How do we make sure that they invest in this area? Because that's one of the things, how do you convince the board to get buy in and who need to be involved?

Jon Ramsey:
So just to be clear on the question, invest... How do you convince a board to invest in security overall or using data sciences for security?

Joseph Carson:
I think both, I think one is getting them involved into using data science for security. Some businesses might be using it for other areas. They might already have a data science part of the business. How do we get them to also include that also in the security portfolio and security part of the business?

Jon Ramsey:
I think if you have a capability in your organization and another part of the business that's using data sciences already, I would leverage that as much as I possibly could. There was a period of time when data scientists, also known as super quants, were really highly focused in the financial vertical. And I think that probably has gotten a little bit boring and now being able to solve a problem where there is an act of adversary on the other side of the table using data sciences is an interesting pull into the cyber domain. I think the conversation with the board is, and for me, I've always structured it around risk reduced per dollar spent. And when I look at the board and I say, "Look, we're not going to be able to have zero risk. It's asymptotic. We might be able to approach it, but the cost is exponential."

Jon Ramsey:
You can reduce half the risk with twice amount of dollars. So when you go to the board and you kind of look at the problems, you highlight the problem that could reduce the greatest amount of risk with dollar spent. And if there's an opportunity to do that with data sciences, then I think that's how you position it. If we take a look at the modernization of this approach, we build out a tech stack that isn't just sort of query based, but it could be graph based and event based and stream based. And you don't use those words to the board, of course, but if we update our stack, we think we can reduce the risk proportional to the amount of cost. And so that's generally how I would position any kind of project from a security perspective.

Joseph Carson:
Absolutely. It reminds me, I did a penetration test quite a few years ago now in the par station. And when I went with the CSO and we went to the board to do our presenter findings, we presented fear. We presented the traditional feedback of security into the executive board. And it didn't work. The CEO was a smart person. The CFO was smarter. They came back and said, "You scared us. That was a really scary presentation, but we need to see the investment. We need to see the return on investment. We didn't need to see the benefits for employees." And it was always a thing when I sit down and they said they need to see tangible return on investment, they need to see something of impact. And to your point, one of the things I always remember what the CFO said to me, "I need to know what is the cost of doing nothing versus the cost of doing something?"

Joseph Carson:
What is that gap? Is that gap 10%? Is that gap 90%? And that ultimately indicated they're willing to invest a certain proportion to reduce the risk. And sometimes they might decide we'll go and do cyber insurance. We'll go and do a cyber captive, or we'll just do better awareness training, or we'll invest in technology, or we'll do a combination of multiple things. And that typically depend... That really indicated their decision making. And that really realized to me, to your point, is that we need to be understanding. My job is to reduce risk and is to listen to the business, understand what their goals are, understand what the risks are, and then find ways in order to use my skills and knowledge to reduce risk. And that's ultimately how I see my position, is I'm a risk reducer.

Joseph Carson:
That's ultimately where I spend my time. So it's just kind of getting into some of the things, are you optimistic that we're making the right... Are we advancing in the right direction quickly enough? Is there things that we can do in order to do better at this? Or is there ways that we can accelerate or also show success? I think one of the things we don't do enough is showing where it's been successful as well. Sometimes people don't like to do that because sometimes it points a target on the organizations when they do that. So I'm just interested, both Kevin and Jon, just what's next, what's the direction and what can we do better?

Kevin Hanes:
Well, I think to answer your question, "Are we doing enough?" It seems like there's a absolutely a ton going on. If I just look at the sort of the number of startups, for example, and the money going into some of the problem space, it feels like there's a lot there. I think to your point of how do we bring some of those things more to mainstream maybe faster? I think part of this idea of confidence, maybe there's a little bit more we need to do in the area of confidence because this idea that you have of taking an action, taking that next step of, "All right let's move it more from the benefits of automation into more the benefits of, let's say, self-healing, or automatic, autonomously doing things," I think all has to do with confidence.

Kevin Hanes:
And back to that thing, it's like, "Okay, by doing this am I creating a or a bigger issue than by not doing it?" And I think there probably is more that could go into that to give us better visibility and ultimately confidence that actually taking that action is the right thing. We know that that taking that action does have some risk, but a system that could tell us that it's a good risk, that it's the right risk to take for the organization based on knowing what the outcomes of it could be, or predicting the outcomes of it, the likely outcome. So I think that's an area where it's building confidence so that we can actually make this next big progression is a pretty, pretty important thing that maybe there's things out there we could do more there.

Kevin Hanes:
Just one thing back on talking to the boards and executives. Never underestimate the value of a data scientist that you would like to go have a beer with or that personality of somebody that can just sit down and say, "Let me kind of talk you through it." And I think we need to do something, maybe that's Cybrary's next thing. How do we create more data scientists that are great conversationalists? Great educators for people who aren't data scientists. If you have somebody like that in your organization, you should put a bubble around them, wrap them in bubble wrap and protect them.

Joseph Carson:
Embrace them as much as possible. Because you remind me, absolutely, the CFO during that pen test was the data scientist. He was the first that kind of really bridged that gap. And he's a financial background, so he knows all about tangible and about data, budget and so forth. And that was a big impact. So Jon, any thoughts around that as well, and any final words or for the audience, are we moving the right direction? What can we do to get better?

Jon Ramsey:
Yeah, I think we're absolutely moving in the right direction. Are we moving in that direction as fast as we need? Maybe not, but I'm very optimistic. The people who defend our way of life take it seriously and to many of them it's personal, and I'm thankful for them and the work that they do. I do want to see the impact of sort of data sciences, if you will, on cyber increase. And I think one important aspect of that is we have to get to a position where we can learn globally and infer locally. So being able to take lots of data feeds from lots of different places, build models on those, run inference algorithms and learning on them, and then push those learnings out to the end points so that we can collectively defend and learn from one another.

Jon Ramsey:
And that has always been the promise of threat intelligence. But I think data sciences and really this edge computing model that's emerging will really help us. The other thing I would like to leave the audience with is having built a lot of systems and Machine Learning, there are a couple things to think about that I've learned that I would like to share and hopefully help progress your efforts. And the first thing is you're going to need a team of multidisciplinary people. You're going to need the security subject matter expert. You need a data scientist, the super quant, someone who understands the math behind the means. You need a developer, someone to codify it, to get it to scale. And you need a data architect and a systems architect where you need to get the systems to scale, not just the solutions to scale.

Jon Ramsey:
And so any effort that I look at, I always have a multidisciplinary team that includes at least one or multiple of those kinds of skill sets. And then we prioritize. We prioritize in this order. First of all, iteration and speed is really critical. There's no free lunch. You're going to have to continuously to evolve the models and the algorithms and the learning approaches and the data and the features and everything that goes in. So be able to iterate quickly, make sure you have a system that you can iterate on. If you have to... If your iteration cycles depend upon a code release cycle that even in a two week sprint, you're not going to get there. It has to be rapid. The second thing I would say is scope the problem. That's the single most important thing when it comes to being able to get high efficacy and in scoping the problem, you want to make it as small as possible and have a clear determination of what right looks like.

Jon Ramsey:
The third thing would be rich data and simple algorithms. If you have to modify an algorithm because you don't have visibility into something, you're missing a piece of data, go get the data. And that's why just as an aside, you see the endpoint detection and response space exploding because all of these analytics used to be based on network data, but we didn't actually see what was occurring on the endpoint. Get the data and simplify your algorithms, rich data, simple algorithms wins the day every day. And then feature selection is critically important.

Jon Ramsey:
Feature selection is the set, the pieces of data, that you use in the model that help drive the outcome of the model. The feature selection is a hard part. To me, that still very much the art of it and that's where the subject matter expert needs to come in and provide the domain context. And then the inference algorithm itself, which everybody thinks is the most important thing, is it Naive Bayes or is it Mini-max or is it Expectimax, or is it Q... All of that, honestly, that's the part. It's hard, it's the easier part of everything. So that's just sort of a playbook, if you will, on how to go at using Machine Learning data sciences with a cyber problem.

Joseph Carson:
Absolutely. I think you're spot on with all of those. For me I, you hit a point that kind of brought home to me as well is make sure focus on the problem. Sometimes the solution becomes the focus and that sometimes puts you down the wrong path because then you forget what you actually were originally trying to solve. So that's a major area. Data veracity is key. Having the veracity, the data is so important. Knowing the right questions to ask and having the data can help you answer those questions is one of the most important, it's the recipe to success, especially in this area. If you know what you want, the questions to ask and you have the data that can give you those answers, then that's the key recipe in the foundation to help you get there.

Joseph Carson:
It's been fantastic, it's been a pleasure having yous on the show. And I think this has been really educational for the audience. I think this is an area which is growing, an area, if anyone's listening that you're interested in cybersecurity, or you're interested in even getting into, if you're a data science background or even in cybersecurity, you want to learn more about data science, definitely this is a show that hopefully you got a lot of value from it. So Kevin, Jon, it's been amazing having you on the show. Look forward to having more discussions with you and for the audience, this episode of 401 Access Denied is all about data science. Tune in every two weeks for the next episode, subscribe to make sure that you continually get updated. And Jon, Kevin, it's been fantastic. Thank you.

Kevin Hanes:
Thank you.

Jon Ramsey:
Thank you, Joe. Thanks, Kevin.

Joseph Carson:
Thank you.

Data Science in Cybersecurity with Kevin Hanes and Jon Ramsey

EPISODE SUMMARY

Other episodes you might like

Prepping for Operational Technology Risks with Jon Ramsey and Juan Espinosa

2021 Verizon Data Breach Investigations Report Top Takeaways

Diary of a Cybersecurity Grad