Today’s guest is Andy Steingruebl, Chief Security Officer at Pinterest, here to talk about DevSecOps, collaboration, and measuring security performance at his place of work! We open with a few details from Andy’s background and how he got into security by working on UNIX systems. After talking about how he splits up his teams, Andy touches on the fact that many issues spill over from one area to another, meaning the lines that divide them are not set in stone and issues get tackled on a case-by-case basis. We shift from security to engineering next, talking about the interaction between application security teams and agile development teams building software. As is often the case, Andy has found that the more communication between the two the better, and he describes how the company culture at Pinterest helps to bolster this practice even further. Secure by default is always a big goal, and Andy talks about the line between using preexisting web frameworks with security baked in and allowing developers to be creative. We dive with Andy into the difficult question of how to measure security performance next, hearing his approach that highlights measuring the applicability of a security control. Wrapping up for the day, we close with some golden advice from Andy regarding security being about people and collaboration, something we would all do well to remember. Be sure to tune in today!
Episode 77
Season 5, Episode 77
Collaborating On Solutions With Andy Steingruebl
Andy Steingruebl
[00:01:27] Guy Podjarny: Hello, everyone. Welcome back to The Secure Developer. Thanks for tuning back in. Today, we have a very accomplished security leader here. We have Andy Steingruebl who is the Chief Security Officer at Pinterest. Andy, thanks for coming onto the show.
[00:01:40] Andy Steingruebl: Yeah, happy to be here.
[00:01:42] Guy Podjarny: Andy, you’ve done PayPal. You’ve done Pinterest. You’ve done much more. Tell us a little bit about what is it you do and maybe how you got into security in the first place.
[00:01:50] Andy Steingruebl: Sure. So a little bit of a non-traditional path that I guess isn't that common today, but I think used to be a lot more common. I learned to program when I was nine. My family – My dad bought a Tandy Color Computer, a TRS-80, and I don’t know, a long time ago when that was a computer you could buy. I learned to program when I was young but then I didn't pursue it. I did some stuff in high school but ended up – I went to school for first engineering, then physics, and finally ended up graduating with a degree in philosophy. But along the way, I ended up taking a job at a computing lab. I became a UNIX system administrator, and an interesting lesson I learned was you may have an idea of what you want to be but some ways of figuring it out or how do you actually spend your time. Not how do you say you want to spend your time but how do you spend your time, and I found that I spent all of my time, every spare moment taking care of, tuning, improving the UNIX systems that I was responsible for back when.
That became the thing that like I guess I need a job in this, and so I ended up being a UNIX administrator first at a university and then later at a pharmaceutical company and so on. But I learned that half, maybe two-thirds of the job of being a sysadmin at a university was security. You’re trying to harden systems against the users at the time, the students, and I had recently been one myself. They’re actually the ones who are your security threat half the time. They’re trying to – This was in the days of disk quotas and other stuff, people experimenting and doing stuff. So I learned a whole lot of stuff about system security. Had to write our own tools. You couldn’t buy stuff back then. You had to write stuff yourself, and so still had to do some software engineering to be a systems programmer and did a lot of security work and then eventually pivoted full-time to doing security. This was the early days of the Web, and so took on things like building access control systems for the Web, both myself and using third-party packages, writing Web applications and occasionally having them get owned, compromised, etc. You learn a lot from having your stuff broken into and the frustration of that and vowing for that not to happen again and learning to get better at it.
[00:04:11] Guy Podjarny: Yeah, definitely. Kind of the empathy piece. Then I think this is sort of a great background, and indeed it's interesting to hear how the UNIX admin path. I think it had its heyday. It's almost like now I feel like maybe software engineering is becoming a greater or even directly a security degree as a path into the industry, but there's definitely sort of a certain stretch where sort of the UNIX admin path has been a popular path.
[00:04:35] Andy Steingruebl: I think back when you actually had to have lots of internals knowledge and write your own software at times to get into it. I tell people the best things that are useful I think or that I think – maybe it’s just me that I think make me pretty good at doing security stuff is, one, having handled a lot of incidents because you get that visceral reaction of people breaking your stuff and learning a lot, sometimes through trial and error of what to do, what not to do, and so on, and then also having to ship stuff. I worked at a startup. You learn how to not let the perfect – the enemy, the good enough on solving things and what you can get right, what you have to get perfect, what you have to have good enough in order to ship things. I think those are pretty useful things for a security person and a software security person. You can't make everything perfect.
[00:05:20] Guy Podjarny: Yeah. No. Very well said. Also a lot of empathy to the people in that world as you work with them.
[00:05:26] Andy Steingruebl: Definitely, yeah.
[00:05:28] Guy Podjarny: Today, so you went on to manage a lot of product security and have substantial amount of security work at PayPal, right, and eventually landed as the Chief Security Officer at Pinterest now. Tell us a little bit about maybe the combination of those or how do you handle product security or application security in Pinterest from an org and team perspective.
[00:05:48] Andy Steingruebl: Yeah, sure. Pinterest is really interesting in that the really rapid hyper growth company, maybe that’s not the right technical word but really rapid growth, and a huge number of users and amount of traffic for a relatively small in terms of employee footprint sort of company. So the whole company is somewhere a little over 2,000 people and so on. The number obviously were growing and so on but with over 400 million active users, and so the volume of that versus the size of company means being really agile in each person having a really big impact on what it is we do, a very big influence within the engineering organization.
The production side of my team focuses on the production side of the product. I have a separate team that focuses on enterprise IT systems and so on, also really important for security. But on the production side, we’re pretty much entirely an Amazon shop, and so at least that means that I'm not having to have a whole bunch of people be responsible for some of the core systems engineering and the security for that because someone is taking care just by nature of being on Amazon. But I split the team roughly in half. One is infrastructure type security, all the core security services. Think key management and authentication and service-to-service access control and so on, so the security capabilities there. The other part of the team is focused on application and product security. That’s more of a federated team that works with all the product teams that are building user-facing features, functions, etc., ensuring that both were doing the right application security pieces and then also building some of the reusable components and services that are sort of higher up in the stack than a core infrastructure. Think user input handling or think about the libraries you’re going to use for preventing application security type attacks or dealing with those things is the remit of that team.
[00:07:51] Guy Podjarny: How do you handle the move of some of historically infrastructure style responsibilities like containers or maybe infrastructure as code that might handle OS and network but has moved into the realm of maybe indeed the application due from a team structure perspective? Is that the responsibility of that application security team or is that more on the info side?
[00:08:13] Andy Steingruebl: I have one leader who's responsible for that whole area. For some of those challenges that sort of split those domains, it’s a whole team effort. Technically, my info team is responsible for things like container security. Think about how we’re going to configure and use Kubernetes to get the right security model and so on. But you think about the amount of overlap between there's a lot of problems where there’s tons of overlap, and there's not a clear dividing line on – think about vulnerability management within your application space. Some of those vulnerabilities are in third-party code that you’re building or using directly in the application itself, and others of them are system dependencies, but you could have a flaw in either of those. So like, okay, technically, the container team is the one that's responsible for the bill of materials for the container, but those things can have a software level impact, not just a systems level impact. There aren’t bright dividing lines. We tend to just tackle it situationally as to how do we make sure we have a good process that captures everything.
[00:09:15] Guy Podjarny: Yeah, that makes a lot of sense. Maybe shifting a bit from security to engineering and maybe kind of we spend a little bit of time talking about that interaction and what works and doesn't. So how would you say – You’ve got the application security team. You talked about themes of responsibility. How would you say the division of work or even interaction between the application security team is and these agile development teams building software?
[00:09:40] Andy Steingruebl: Yeah. I mean, the biggest lesson I think I've learned over time is mostly meeting software engineers, developers where they are. So, in a previous life or earlier stage of secure development, people would do things like create a separate bug tracking system for security bugs versus regular bugs, and a lot of us were like, “Oh, because security bugs have to be kept secret and separate, and we need our own workflows.” Then you realize like they don't get fixed that way. So one of the biggest realizations is meeting people where they are, whether that's SecDevOps or what, I think those are – They’re all new terms that people have come up with, but it’s meeting people where they are and trying to integrate with existing workflows of how people do their job, making things easy and safe by default so that security isn't everybody's default job all day long. Most of which are developing software or supposed to be building certain things functionally, and they’ve got nonfunctional requirements. But those are all sorts of nonfunctional requirements from performance to availability to all sorts of things. You want those to be mostly inherited and happen by default with a certain testing framework and so on, but not somebody's primary concern.
So the first is to try and approach it that way, but it's one of the best. The other approach is that you have to have software engineers on your team, actual practicing software engineers as part of your application security practice. Earlier in my career, like I tried leading those things and being the subject matter expert, and I've never been a software engineer for a living. That's not been my day-to-day. I wasn't very effective at it, so one of the lessons learned was having software engineers on your application security team because they’re way better at interfacing with other people who do that for a living than somebody who spent their career as a Unix sysadmin or a network person or something like that. You lead the program but you can't do the day-to-day.
[00:11:31] Guy Podjarny: Yeah. I think I very much kind of relate to that. I think also in part it’s historically a lot of these same security concerns would've been handled by the IT team, so actually someone from an IT background would have that sort of empathy and appreciation for their surroundings. But if now a lot of these decisions are done by software engineers, then that’s the skill set that you might need in your team. That makes a lot of sense. So how do you support that secure by default? Is it an expectation or a definition for something the software development team does? Is it your team actually building services or infrastructure or components that then the rest of the engineer organization uses? How does that work?
[00:12:12] Andy Steingruebl: Yeah. It’s a combination of both. I mean, we do hit everybody. Don't get me wrong. Education and some other stuff is important, so we hit everybody with training, awareness, and so on. But we try to keep it at a very practical level, rather than a theoretical level because I don't find that that useful anymore for everybody to understand the guts of every security issue. If you think about – I'm also kind of lucky in this role that we don't have a lot of legacy stuff we’re taking care of, and so most everything is fairly modern and up-to-date. That just means that when you're using modern web frameworks and platforms and so on, a bunch of the security things you want are baked in in there by default. Maybe it was a lot of years ago now, the like Rails. Ruby on Rails made a change at one point quite a number of years ago to do output escaping by default. Instead of asking people building templates and so on to turn it on, you had to turn it off. When they did that, all of a sudden, cross-site scripting on Ruby on Rails apps plummeted because that output escaping was just there by default and you didn't have to think about it. You didn’t go out of your way to turn it off.
The great thing about many or most modern web and app frameworks is a bunch of these things are just baked in, so you get access to cross-site scripting, CSRF, and so on protection almost for free by using modern frameworks, assuming that you don't go tweaking them or turning those things off. So a lot of the basic things now sort of come for free as part of the packages that we use to develop things. Others that we need we do go and do them ourselves, we’ll build them and incorporate them into various people’s applications. More at the info level than the application sort of higher up the stack but if there's a secret to management component, we’ll write the libraries for people and the bindings for them to use in various languages that we have across the company, for example.
[00:14:07] Guy Podjarny: Got it. I, again, very much relate to the choice or sort of the advantage of these frameworks that have these built-in protections. Does it ever conflict with this developer agility bit? How do you help guide that when on the other hand you have probably fairly empowered teams that can make a lot of decisions on their own?
[00:14:26] Andy Steingruebl: Yeah. I mean, it’s one of those culture aspects of how do you manage that because software engineering is still some combination of engineering and creativity and – not art. It’s too strong a word, but there's definitely a creativity aspect to it. When you tie people's hands on things, that also can result in bad outcomes. So we do try to constrain somewhat, though, how many languages, frameworks, and so on we use because ultimately you are actually trying to be productive and solve business and technical problems, not invent a new programming language, at least not in our case, for the most part and so on. There's times where we have to go and invent an area completely from scratch where that's really important to us, and there’s other times when our energy is better spent on reusing something that already exists and not writing our own brand-new web framework or something like that.
Mostly, we try to steer folks towards using those things, and there's times where you even need to tie people's hands a little bit on certain coding practices or whatever. I think some of us remember when Microsoft banned certain function calls way back in the C days of banning stirred copy and things like that as being inherently unsafe and they slowly – Or not slowly but they did that and then they’ve also banned certain other practices over time. They were the leader for a long time on this, probably still are. There’s a lot to learn from what Microsoft has done. So to the extent, we can like – I don’t know. We’re not writing a lot of software in C and C++. We’re using memory safe languages, which sometimes conflicts with certain people's performance desires or whatever, but it also results in things that we can better assess, manage, and secure. So it’s a fine balancing line there.
[00:16:10] Guy Podjarny: Got it. I think you mentioned culture there. How do you see indeed the balance between the tech and the culture? What type of boundaries might you encounter where you can just sort of make this go away with the technology solution like the choice of the tech platform?
[00:16:26] Andy Steingruebl: I mean, also like my existing role makes this really easy. Pinterest has hired an amazing engineering organization like the folks around here. I remember going to a couple of postmortems, which I’ll talk about in a sec just has a great practice. But going to a postmortem, seeing that there was an issue, and my team didn't have to solve it. The other engineers at the postmortem who own the certain portions of the environment were the ones who said, “Oh, the right fix to do X, Y, Z. We should centralize the thing so that the logic is not spread around.” I was bewildered because this was a new but awesome thing to me of having such a great engineering organization that they already knew the thing they needed to do and willingly took it on. I don't have to – Not the usual CISO job of feeling you have to twist arms and so on of people.
One of the company values is be an owner, and I see that left and right. So company culture plays an important role in people seeing issues and tackling them and owning them. It does start from just engineering culture of having a culture of building excellent things, people having pride in ownership, people having trust amongst each other. One of the things about having an excellent engineering organization and me having awesome engineers on my team is that there's a lot of trust that when there's an incident, somebody on my team can go and look at the other person's code on another team and maybe even go and to fix it and apply the change. Rather than at some companies where that would result in turf battles and how you can't touch my stuff, I’ve got the luxury of having awesome people I work with, and that's actually welcome. It’s like, “Oh, awesome. Thank you for going and fixing that.” Not how dare you touch my stuff. So there's a culture aspect of trust and quality and ability of people collaborating on stuff like that, rather than having turf and domains and so on, people having pride in ownership but also trusting other people to do quality work.
[00:18:21] Guy Podjarny: Yeah. What I probably like most about what you described right now is how you really could've said this whole stretch about something that is not security. You didn't really say security in there. You just talked about ownership and ownership of quality and of your sort of problem domain and whether it was accessibility or performance or operations. You could have kind of the same level of ownership and desire to do the right thing and get it resolved.
[00:18:47] Andy Steingruebl: Yeah. I mean, it starts with visibility. It’s like setting goals and tell people what you expect and try to make it observable and visible to them, so it's not a surprise. They understand what the measuring tape is. Security is a little harder. Performance you can measure latency or things like that a little bit easier than you can than is it secure. But for certain bugs and issue types, you can say did this past all of these regression tests? Were all of your dependencies up to date or not? Did you use certain functions that are banned and you're not supposed to be using? Thus you – One of our automatic linters or rule checkers flagged you because you did something like put a secret in your source code when you're not supposed to. There are certain things that you can automate and make obvious to people and just sort of set a clear expectation for them. When you do that and you have a generally good culture of people wanting to do quality work, that all sort of works really nicely together.
[00:19:41] Guy Podjarny: I think once you see – If you’ve opened this door now, how do you measure security? I think like everything you described around patterns are good around tracking and automating some security visibility that you said, how do you measure if you're doing well or if you're improving or deteriorating?
[00:20:00] Andy Steingruebl: Yeah. Man, it’s the question that just defies answers. I’ve been studying this for a long, long time. There's been some great work that the Carnegie Mellon SCI has done on this and so on. There’s a great paper once upon a time by Jeanette Wang I think trying to analyze like attack surface in a very methodical way I think in an FTP Daemon, like just exactly how many code paths are reachable from the outside and then are all of those secure against certain bugs. I wish there was some nice metric I could use for that. First, I try to tackle it from a sort of a – take the taxonomy, take the common weakness, CWE or something like that and say which of these flaws, weaknesses. OWASP Top 10 is not a good thing to use because it varies over time but the same idea. Pick some list of attacks you care about and do some sort of assessment for whether you think you've got a proactive, well-defined way of avoiding that issue in your code, and here I'm just talking about software security because that'll constrain our space a little bit.
Take a list of issues or flaws you care about and try to quantify to the extent you can, whether you’ve got a proactive, well-defined, on by default, and that you can detect deviations from, control, software implementation, reusable way of preventing that flaw, whether it's a framework or even just a coding style or architecture even that make sure that you don't have that issue within your software, and assess whether you can be sure that it's applied universally and that you don't have deviations from it. Then periodically test that that thing actually works well. Repeatability and automation here is really important, so you build up an output filtering library to stop cross-site scripting, and you make sure that it's universally deployed so that at least if it has a bug, you can go universally fix it, the same for CSRF.
It’s harder for certain other flaws that have a well-defined architectural pattern to avoid. It’s like if you don't let everybody write SQL, instead of some sort of data access layer or DAO sort of style, then you compartmentalize all of your queries and it’s not just like how are you creating the SQL queries in the first place, whether it’s stored procedures or whatever. But you can come up with some reusable patterns at the actual software engineering level that ensures that people just don't have the access to that stuff. So you’re constraining your choices a little bit but you can also get a lot higher assurance for those. Then you can build a sort of scorecard, whether it's the – I think James Landis did the periodic table of app sec types and so on at one point. You can – Maybe it’s color coding all those things for yourself of which of these do I have really high confidence I don't have and which of these do I not have high confidence about being present in my application.
[00:22:48] Guy Podjarny: Yeah. I keep asking this question and I'm kind of waiting for someone to have an answer to do it. I think you called it correctly, which is it's very, very hard to quantify your kind of risk mitigation level, but a very useful thing you can measure is the applicability of a security control, right, if I understood you correctly.
[00:23:06] Andy Steingruebl: Yup.
[00:23:06] Guy Podjarny: Then you can choose how well you’re handling it or not. But at the very least, you can measure how well it's deployed or how well it's rolled out, so you know you have that leverage.
[00:23:16] Andy Steingruebl: I tell people I'm an optimist about things, but there was a great paper done quite a number years ago by Andy Ozment and I think it was Stuart Schechter. I think they collaborated on a paper back when it was whether code ages like wine or like milk. Does it get better or worse over time? It’s a really great paper and they did an analysis of open BSD security flaws trying to figure out when were bugs introduced and how long were they in the codebase before they were discovered and a sense of like do codebase get more secure over time and are you eradicating those bugs? Therefore, you expect most bugs to be really young because you've gotten rid of all the old bugs. But it turned out not really to be the case. Some number of bugs defied discovery for a very long period of time, even in a codebase that prides itself on security review and having really tight standards. That makes me a little bit pessimistic about certain categories of bugs anyway.
So while we may be able to eradicate certain classes of bugs, certain of them are just going to defy discovery, unless we even more tightly constrain our coding model and so on back to the to the previous point. It’s like, “Okay, we could ask everybody to develop and design by contract programming languages and things like that. Use functional programming to avoid certain kinds of side effects and so on. I don't see everybody coding in Haskell overnight right away.
[00:24:41] Guy Podjarny: That might take a while. It might not be optimal for other things, right? As you said earlier on, you’re trying to find that sort of good, agile, iterative, and nothing is ever done type element. Security is no exception to that. Andy, we talked about a bunch of techniques that you use and things that you do well. What’s an example of something that you will never do again that you’ve tried and you’re not going to do again?
[00:25:06] Andy Steingruebl: Many years ago, we used to think that training every developer to be an expert on all sorts of security issues was the only way to fix things, and maybe this predates having reusable frameworks or things like that. But we used to do extensive in-person like multi-day - "Let's train everybody on the intricacies of cross-site scripting and cross-site request forgery and so on". While it seemed like a great idea at the time, most people don't retain it because it's not being used day-to-day. It turns out there's – I think we’ve talked about it a couple of times. There's much higher reliability or repeatable ways to get those outcomes, rather than making everybody an expert on every single one of those issues. Giving them visibility or giving them a set of tests to maintain, rather than making them an expert.
I used to do those trainings. We’d run several hundred people through many, many hours of detailed security training on exactly here's how cross-site scripting works and here's how to avoid it and here’s all these corner cases. It turns out there's only some number of people you need to train on those things and maybe they're the ones writing the framework, not every single person using the framework. So your time is better off spent building reusable frameworks and systems than trying to make everybody in the organization an expert on all of these deep security topics, which doesn’t mean people don't need to know about security but you want to boil it down to a set of use this thing, do this thing this way, and then you avoid it. Have a set of prescribed paths to follow, rather than making everybody on expert on every single issue.
[00:26:39] Guy Podjarny: Yeah, I definitely agree. You build – It relates to that security control comment that you said before. If you can narrow it down to a set of specific platforms or research in the majority of cases, then if you make those effective, then you can eliminate a lot of the problems in a broad fashion.
[00:26:57] Andy Steingruebl: Yeah. So, I mean, it’s about scale, right? Automation and scale eventually. Rather than manual bespoke solutions to every single problem, come up with repeatable ways of doing things and leverage those as much as you can.
[00:27:09] Guy Podjarny: Yeah. I have a whole bunch of additional questions for you. But I think we’re kind of getting tight on time here, so I’ll just sort of ask you one more. You have a lot of mileage here in this world of security. But if you made a team looking to sort of level up their security foo and you have sort of one bit of advice to give that team, what would that advice be?
[00:27:31] Andy Steingruebl: Don't ignore the people aspect of getting this stuff done. So much of security, whether it's in the app space or in the product space or whatever is about collaborating on solutions, not about just technical excellence. So one of the best pieces of career advice I ever got was decide do you want to be right or do you want to be effective. They're not always the same thing. As engineers, it’s very frequently the case that like I want to say this is the right way and I’m going to prove it to you. Or after something happens, say, “I told you so.” I don't even need – I think I need no fingers to count the number of times saying I told you so actually worked out well where the other person said, “Oh, you're right. You did tell me so, and I'm so happy you’ve pointed it out to me.” It's getting stuff done is way more important than being right about something, and that means understanding people and working with people and collaborating together on coming up with solutions. Not saying that it's my one way of doing it or whatever. There's several ways to solve a lot of different problems, and the human element is what's really necessary to up your game, not just technical expertise. That matters, but ultimately getting to a workable solution that people will implement is what matters. Not just building it and then having no one adopt it. Building something and having no one use it is the same as not having built it at all.
[00:28:53] Guy Podjarny: Yeah. Very, very well-said. Andy, this has been a pleasure. Thanks for coming onto the show and sharing some of your wisdom and hard-earned lessons here.
[00:29:01] Andy Steingruebl: Yeah. This is great.
[00:29:03] Guy Podjarny: Thanks, everybody, for tuning in, and I hope you join us for the next one.
[END OF INTERVIEW]