📜 How Bias is Built into Algorithms (w/ Cathy O'Neil)
DEE SMITH: Hello, Cathy. Good to see you. CATHY O’NEIL: Thanks for having me. DEE SMITH: Thank you for coming to visit today. I'd like to talk a little bit first about your interesting background. Because you've got a very interesting pathway that led you to where you are today with some very interesting detours and in ways, and out ways, and byways. So tell me how you got interested in mathematics and what led you to where you are today. CATHY O’NEIL: Wow. I think I started becoming interested in mathematics when I was five. My mom gave me a spirograph set. You know what those are? Like, you draw those little circles inside circles. And I just remember seeing periods-- I'd, like, there was something that had a pattern of six and then something else that had a pattern of three. And so that's where I sort of figured out prime divisors. I figured out that there's a three and six, and there's also a two and six. And then I started thinking about other numbers and how they had different numbers hiding inside them. I didn't know that was math exactly. But I was very fascinated by this idea of primes. My mother is a mathematician, which is probably the biggest influence on me. She got her PhD in the '60s from Harvard in Applied Mathematics. And my father, as well, is a mathematician. So it was kind of the family business, to be honest. It wasn't that much of a stretch. But I wasn't really, really into math because I was really into music, and I wanted to be a musician until the summer I turned 15. And then I had a choice to make between going to a music camp in Switzerland to practice piano five hours a day or to a math camp in Western Massachusetts. And I went to the math camp. And that was kind of the last moment I had anything else in mind until I turned 35. Basically, that summer when I was 15, I decided to become a math professor. DEE SMITH: So you have what I would call a very deep background in mathematics. It doesn't get much earlier than how early you started. And that led you on an interesting journey, and you ended up at the hedge fund DE Shaw right at the time that the financial crisis hit. CATHY O’NEIL: Yeah, that's right. DEE SMITH: As a quant. CATHY O’NEIL: Yeah. I mean, I got to New York. I was an assistant professor at Barnard College. I had done all the things-- get my PhD, become a postdoc, et cetera. And once I became a professor, I was like, actually, you know, I'm not really sure this is suited to my personality. I like people. I like feedback. And being a professor in math, in number theory-- slow field. You know, it wasn't exactly the most stimulating, exciting environment. So I wanted to-- but I loved New York. So I got a job in 2006, early 2007, to work in the hedge fund DE Shaw. And I left academia and went there in June of that year. And almost immediately the financial crisis hit. DEE SMITH: So what was it like to be there? That was an important time. What was it like to be there with the financial-- CATHY O’NEIL: Right. I mean, I was working with Larry Summers on his projects. So it was very central to the financial system with these fancy people. It was weird. It was really weird. I mean, it was disillusioning, I think is the right way of thinking about it. I had not known much. I was very naive. Politically, I was sort of apolitical. I figured that people in finance really were experts, knew what they were doing. I was one of those people that was like Alan Greenspan is so cryptic that he must be a brilliant man, you know? Once I was working inside the financial system when it was cratering, I was like, wow, I'm not so sure that they know what they're doing. I'm pretty sure that kind of nobody knows what anybody's doing. Or they know maybe locally what they're doing. But most people don't have a real sense of it and in the scale of it, especially. DEE SMITH: What do you ascribe that to? A, the belief that they do believe they know what they're doing, but the fact that they really don't. CATHY O’NEIL: It's a good question. I think it's telescoping. Its modeling and telescoping. OK, so let me unpack that a little bit. Basically, the people that we thought knew what they were doing were all economists. And economists-- what they do is they simplify the world into toy universes, these models that they have, for the economy, or whatever people-- people themselves-- like, people are rational agents that know how to get what they want. They have utility functions, blah, blah, blah. Everything is simplified. Everything is actually more complicated. And at some level, I think economists know that. But at a high level, they really do think in terms of models. And I think the people that were kind of running the Fed, running the economy, at the time had a very simplistic and incorrect model of derivatives in particular, like what derivatives were doing. They just really modeled them as, like, spreading risk. And they thought of it as a diversification, which is a good thing. If we all have a little bit of risk, it's better than having risk concentrated in one place. That was their model. What, in fact, was happening was that risk was proliferating because of the opacity of the instruments that were being traded so much, like the mortgage-backed securities with the credit default swaps. That stuff wasn't simply spreading risk. It was creating risk and spreading risk. And yes, we did spread risk. That's one of the things that happened. And that's why when the crisis happened, it hit the world, rather than just a few banks, right? So yes, there was spreading. But there was so much proliferation. There was so much of the expansion of the housing bubble itself because of the derivatives, because of the markets. It was like everything was greasing every other wheel at the same time, and nobody was really keeping track. DEE SMITH: So spreading risk actually made risk worse. CATHY O’NEIL: In this instance, absolutely. I mean, if you had a certain amount of risk and you just spread it thinner, and thinner, and thinner, that might be a good thing. But if you have a machine that builds risk and spreads it, then it's hard to keep track. And that's a closer analogy to what was going on. DEE SMITH: So you know, the question of the models is a very interesting question. Because people make these models, and then they become convinced, for whatever reasons of human nature needing to feel like we are in control of things or we actually understand what's going on when we don't. We convince ourselves that the models do represent reality. I mean, this is true, and not just in financial systems, but in companies, you know, where people make these complex spreadsheet models, and they think they reflect reality. And of course, they don't. And you know, that's another related kind of problem. How do you see the reality of modeling having evolved, if at all, since the financial crisis? CATHY O’NEIL: I mean, certainly, models are used more if you include in that not just economic models, like we're discussing, but things like algorithms. And think about all the VC-funded big data, AI companies nowadays. They are all based on the assumption that predictive analytics, AI, or whatever you want to call it will be able to solve problems. So they're sort of inherently assuming that modeling works or works well enough for this company. Uber, all those companies that are sort of using the algorithm as a basic business model. DEE SMITH: And what do you think? Do you think that-- I mean, give me your opinion of that. CATHY O’NEIL: I mean, you know, full disclosure, I have an algorithmic auditing company. The point of my company is to poke at that assumption. And the very short answer is it's a very narrow view. Most algorithms work narrowly in the way that the company that built them and is deploying them wants them to work, but probably fails in a lot of other ways. And they might be unimportant to that company. But it might be important to the targets of the algorithm or some other side effect of the algorithm. So I mean, just going back to the way that economists thought about derivatives, like the way they talked about it, the way they thought about it-- it worked for them. Put that in quotes. Because that's what you'll find over and over again with models. If models are working for the people that are using them, whether that's because the data looked good and that they weren't looking at other data, or because it worked for them politically, or because they kept on getting better and better jobs when they talked about how great these models were-- You could even think of it as a corruption in a certain sense. Because working politically for them is still working for them, right? I'm just saying that that is a very narrow view. And the real question isn't this work for you-- because yes, it is. You wouldn't be doing it if it didn't-- the real question is for whom does this fail? For whom does this fail? And that's the question that isn't being asked, wasn't being asked then, still isn't being asked now. DEE SMITH: But also, the definition of working for you can also be misleading. I mean, what does it mean to be working? It worked for you for the moment. Maybe you got a promotion or something. But if it brought down your company, is that really working for you? CATHY O’NEIL: If you got another job. I mean, that's the thing about people don't quite understand how cynical the world of finance was at that time. I would talk to people about that. Like, oh, this model seems flawed, seems like as a business model, it's dangerous for the company. Oh, but I'm just going to skip ship when it fails, and I'll get another job. And that was the assumption. So it's a very, very narrow perspective. And yeah, working means, in the case of many of those models that we saw fail during the crisis, simply meant short-term profit. I mean, it was very simple. It was very money-based. The kinds of algorithms that I think about now-- let's talk about the Facebook news feed algorithm-- works for Facebook, again, and it ends up translating into money. But the short term, the sort of more direct definition, is engagement, like keeping people on Facebook. So we're going to privilege the news feed items that keep people on Facebook. We're going to demote the items that people tend to leave Facebook after reading or seeing. And just that one thing-- of course, it is aligned with profit, because the longer people stay on Facebook, the more they click on ads, the more money Facebook makes. And so that's their narrow definition of working. They're like, this is working because we're making more money. It's very clear that that's their incentive. But what we've seen in the last few years-- and it was pretty predictable, actually, looking back at it-- is that that also privileged things that we find outrageous. Because why do we stay on Facebook? To argue. Things that we find divisive. Why do we stay on Facebook? To get outraged, to fight with each other. DEE SMITH: To be part of a group that excludes others. CATHY O’NEIL: Yeah, or to even be radicalized, and to find your people, your new radicalized people. There's all sorts of stories we've heard. What it doesn't privilege is thoughtful discussions that makes you go and do your own research at a library. Like, we all know that, right? That's not happening. So we've seen that when Facebook optimizes to its bottom line, its definition of success, which is profit, it's also optimizing directly away from our definition of success, which is in being informed and not fighting. DEE SMITH: And there's an even further irony to it, which is that by optimizing to that narrow definition of working for them, they've put their company in the crosshairs. In other words, it may not work for them. It may be a very extremely corrosive thing for the company itself in the longer term, in the bigger picture. And yet they've been focused on this very short term-- short termism, of course, is one of the great problems of our ages. But to back up for a minute. I want to ask you about what an algorithm is. Because it is a term that's thrown around all the time. And you know, I'm a music and math person too. And there is a kind of beauty and crystalline structure to mathematics that makes people think that it doesn't lie. And mathematics indeed does have proofs, and mathematics itself doesn't lie. But the assumptions behind mathematics most certainly can lie. Walk me through a definition for people, who don't maybe understand when they throw around the word algorithm, what is an algorithm. CATHY O’NEIL: OK. I'm just going to back up and just disagree with one thing you just said, which is I feel like axioms in mathematics, if stated as axioms, they're not lies. They're just assumptions. The thing that we're going to see in my explanation of what an algorithm is is that it's not mathematics at all, actually. So what is an algorithm? When I say algorithm, I really mean predictive algorithm. Because if just taken straight up, an algorithm just means a series of instructions. Like, that's not what I mean. I mean a predictive algorithm. So what I mean by that in almost every algorithm I will discuss is predicting. And not just predicting something, predicting a person. Most of the examples I talk about predict a person. Are you going to pay back this loan? Are you going to have a car crash? How much should we charge you for car insurance? Are you going to get sick? How much should we charge you for health insurance? Are you going to do a good job at this job? Should we hire you? Are you going to get rearrested after leaving prison? That's your crime risk score. It's a prediction. It's a scoring system. It's even more precise. It's a scoring system on humans. Like, if your score is above 77, you get the job. If it's below 77, you don't get the job, simply that kind of thing. But more generally, a predictive algorithm is an algorithm that predicts success. Success is the thing I've just been mentioning in those examples. Like, are you going to click? Are you going to get in a car crash? Those are the definitions of success. Specific event. And the reason you have to be so precise about that is because you train your algorithm on historical data. So go 10 year, 20 years. This is what I did when I was working as a quant in finance. You look for statistical patterns. You're looking in particular at initial conditions that later led to success. So people like you got raises. People like you got hired. People like you got promoted in this company. So we're going to hire you because we think your chances of getting a raise, getting promoted, and staying at the company are good. DEE SMITH: Because you match the pattern of people who have had that happen. CATHY O’NEIL: Exactly. And so the inherent thing is things that happen in the past, we're predicting will happen again. But we have to define what that means. Like, what particular thing is going to happen. That's the definition of success. So really, to build an algorithm, a predictive algorithm, you just need past data and this definition of success. And that's it. And then you can propagate into the future patterns from the past. DEE SMITH: And so how does that play out into-- I've heard you give a wonderful real world example-- CATHY O’NEIL: Oh, yeah. DEE SMITH: --of your-- CATHY O’NEIL: Of my kids. DEE SMITH: Your kid. CATHY O’NEIL: Yeah, yeah. So I talk about this. Because I mean, I really do think it's, like, a nobrainer. It's not complicated. It's something we do every day. Sometimes we give the example of getting dressed in the morning. Like, what am I going to wear. You have a lot of memories. It doesn't have to be formal. It doesn't have to be in a database. It's just like memories in your head-- things I wore in the past. Was I comfortable? If that's the definition of success for you today, being comfortable, you have a lot of memories to decide what to wear if you want to be comfortable. If you want to look professional, then you have memories to help you look professional. Another example I'd like to give, though, that shows more of the social structure of predictive algorithms and how things can go wrong is cooking dinner for my family. So I cook dinner for my three sons and my husband. And I want to know what to cook. And so I think back to my memories of cooking for them. This guy likes carrots, but only when they're raw. This guy, you know, doesn't eat pasta, but he likes bread. And then I cook a meal. Of course, it depends on what ingredients are in my kitchen. So that's data I need to know. How much time do I have-- also data I need to know. But at the end of the day, I cook something. We eat it together. And then I assess was this successful. And that's when you need to know what was my definition of success. And my definition of success is did my kids eat vegetables. And I say this because I want to contrast it against my youngest son, Wolfie, who's, like, only goal in life is to eat Nutella. Like, his definition of success, if he were in charge, would be like, did I get to have Nutella. And so the two lessons to learn from that are first of all-- well, the first thing is it matters what the definition of success is. Because I'm not just asking to know. I'm asking because I'm going to remember this was successful, this wasn't successful. This was. And in the future, I'm going to optimize to success. I'm going to make more and more meals that were successful in the past because I think they'll be successful again. That's how we do it. We optimize to success. And the meals that I make with my definition of success are very different meals than I would make off of my son's definition of success. So that's one really important point. But the other just as important point is that I get to decide what the definition of success is because I'm in charge. And my son is not in charge. And so the point I'm trying to make is it's about power. Predictive algorithms are optimized to the success defined by their owners, by their builders, by their deployers, and it's all about the power dynamic. So when we're scoring people, the people who are being scored might not agree with the definition of success, but they don't get a vote. The people who are owning the algorithm, the scorers, are the ones who say, here's what I mean by a good score. And that could seriously be different for the person who's being scored. And for that matter, many of the examples I wrote about in my book, they're just unfair. They're simply unfair. Never mind the definition of success. They might even be defined in a reasonable way. But the score is actually computed in an unfair way. And the people who are being scored have really no appeals system. So the typical situation for a real power relationship-- and most of the examples I just gave are like that-- insurance, credit, getting a job, or sentencing to prison-- all of those are examples where the standard setup is the company who uses the scoring system licenses the predictive algorithm, the scoring system, from some third party. That third party basically scores the person, tells this big company what the score is. The person being scored can't ask any questions because these guys don't even know how it works. Typically, they have a licensing agreement that stipulates that the big company will never get to see the secret sauce of the scoring system. DEE SMITH: Because it's a trade secret. CATHY O’NEIL: It's a trade secret. So it's really opaque, and it's often unfair. And there's just nothing that the person being scored can really do about it. And at the same time, they're missing out on really important financial opportunities, or job opportunities, or even going to prison. DEE SMITH: Well, this is really a problem. Because people, when they see that something's been done by a computer and that there's an algorithm, this magic word involved, they think, oh, well, that's objective, because the computer did it. People didn't do it. It's number crunching, and it must be right. I don't like the result, but it must be right. And then the people who are in charge of it feel like they've, in some way, moved the responsibility for the decision over to some black box that has some kind of magic secret sauce, whatever you want to call it. But that it's because it's mathematical and algorithmic, it's objective. And so that is just not true, is it? CATHY O’NEIL: It's really not true. But you said it well. I mean, that is the assumption going in. And that's sort of the blind trust, I call it, the blind trust that I'm pushing back against. That's what I do now. I push back against this blind trust that we have. Now, it's true that it's not idiosyncratically favoring certain friends. It's not nepotistic, right? Like, you could imagine a hiring manager who just simply lets their buddies get a job, even though their buddies aren't qualified under the official rules. Algorithms don't do that. They're not particular to a specific person. But they are inherently discriminatory inasmuch as the data that they're trained on is discriminatory. So for example, if we train an algorithm to look for people who in the past were given promotions in a company where men got promoted over women and tall men got promoted over short men, then that algorithm would be trained on that data and would replicate that bias. So there's no sense in which it's actually objective. To be objective would mean something else. It would be closer to something like, what does it mean to be good at your job. Let's measure the person's ability to do their job. That would be closer to objective. And that's not what we typically see with algorithms. We're training on success. But success is really a very bad proxy for underlying worth. It's actually difficult to measure somebody's underlying worth, underlying ability. And you can try to get there through various means. But most of those means that we've developed are known to be biased, biased against minorities, biased against women, biased against the usual suspects. And so what we end up doing is replicating that bias in the algorithms. And in some sense, you could just say, OK, we're just doing what we used to do-- no better, no worse. I would argue that it's actually a little worse. Because now we are also imbuing this stuff with our blind trust. We think we've done our job. And we think, OK, great. Now it's fair. We don't have to worry about it. In fact, we do have to worry about it. DEE SMITH: And there are at least two ways that this goes off-track. One is the people who are sort of selecting the tools and the elements that are going to feed into this in terms of what's important and what isn't. Or even if you didn't have that, and you just are looking at past patterns, if the past patterns are going to be whatever they are, if it's machine learning, the machine is learning itself from that. It's still going to reflect the fact that the company only ever hired men. CATHY O’NEIL: Yeah. I mean, you're distinguishing-- and I was kind of conflating those two things. And I think it's really important to distinguish there's lots of different ways algorithms can mess up. And one of them is bad data. Like, garbage in, garbage out. If it's biased data, it's going to propagate the bias. That's what we just discussed. But then there's also the definition of success itself. And you can take perfectly good data, but turn it into a garbage algorithm by defining success the way Facebook defined success and making things worse-- making everyone worse off and democracy dissolve. DEE SMITH: So I mean, you have a few really interesting examples. I heard you talk about teachers once and a thing that happened. I think it was here in New York state? CATHY O’NEIL: Yes. Yeah. DEE SMITH: Walk me through that. CATHY O’NEIL: I mean, nationally, really. It was happening in more than half the states, but mostly in urban centers. So in particular, this was a teacher accountability movement effort, started by Bush and continued by Obama. So it was like bipartisan, like, let's close the achievement gap. That was the sort of political rhetoric. But really, what it came down to was, like, let's find bad teachers and fire them. And it was a mainly, like, weapon against teacher unions, to be clear. And I bring that up because the definition of success was does this work as a weapon against teachers unions? And it did pretty well. It didn't work statistically. It was a huge mess. In fact, it was almost a random number generator. It was very, very inconsistent. Some teachers were given two scores-- and they were scored between 0 and 100-- for the same subject for the same year, but they taught seventh grade math and eighth grade math. They were given random numbers, sometimes a 2 and a 98. They were simultaneously supposed to be the best teacher and the worst teacher. It didn't make any sense. It was very inconsistent. And yet it was being used to fire teachers. Hundreds of teachers got fired based on bad scores. DEE SMITH: I know school districts were paying a lot of money to access this. CATHY O’NEIL: Paying a lot of money. And again, it was the same licensing situation. The people that were actually building those scores weren't telling anybody in New York City, in the DOE, how it worked. And yet the DOE were intent on using these scores to decide on people's tenure cases. So it was really bad. The only way I know that it was almost a random number generator is because a New York Post reporter did a FOIA request to get a bunch of the teachers' data and their scores, their names and scores, and they published them as a way of shaming teachers with bad scores. That was not great because that was way too much trust. If you put in any of these scores, they didn't deserve that trust. But then this high school math teacher from Stuyvesant High School, Gary Rubinstein, found that same data, which was not publicly available. And he found a bunch of teachers that had two scores for the same year for the same subject. He did a scatter plot and found almost random-- almost random. It's Embarrassing. It's embarrassing. But it goes to your point, which is that people-- as soon as they hear mathematical model, it's like-- sometimes I think about it like did you see Men in Black? DEE SMITH: Oh, yeah. CATHY O’NEIL: After they meet the men in black, they put that light. You can't think for a few seconds. You're totally discombobulated. It's like that. It's a flash in your eyes. And people just-- oh, it's a mathematical model. I can't ask questions. I guess I'm not qualified to ask questions, even though it's my job on the line. So just to finish that out, six teachers in Houston, Texas, who were fired based only on their teacher value-added model scores, sued and won. This is exciting to me because it's one of the only examples of a lawsuit that has succeeded in court around an algorithm. The judge found that their due process rights had been violated, because nobody could explain the score. That's how opaque it was. DEE SMITH: So this has really serious implications, not only for society, but also for business. Because what you're describing is a system that simply looks at past patterns and extrapolates that those are going to go forward. In a world that is filled with nothing but change-- CATHY O’NEIL: Exactly. We mean better than that. DEE SMITH: It's the problem of induction, really. It's things will continue as they are until they don't. CATHY O’NEIL: You know what? If we had a perfect world, great, let's propagate the past. We don't, though. You're right. DEE SMITH: We don't. So the role of big data is growing. The faith in big data is growing within business, for example. Talk about business for a moment. That's a very warping thing for companies to be making decisions that affect shareholder value that affect their profitability. Even if it's an accurate read of the past, it's assuming that the future is going to replicate the past. And that is just not-- it's less true than it ever was, and it never was true. How do you see that playing out? CATHY O’NEIL: Yeah. I mean, it's a really important point. And I would go further. I mean, I would say that's true. But even if that's OK with the businesses, even if they're like, actually, we're not predicting who's going to be good at their job in our company, we're just predicting the seasonality of the oil business. And it's better to have a gas than to not have a gas. In some cases, it's not an ethical question. It's just like, is it accurate? And what I found through my company is that you'd be surprised how many of these big data algorithms are actually not even answering the question that the business people thought they were asking. There's a real disconnect between the business people on the one hand who think, I've got to get my hands on this AI stuff-- this is the new hot stuff, and it's going to increase profitability and efficiency-- and the data scientists who are being turned out by these data science masters programs, who, I've been trained to sort of optimize in one way and one way only. They don't really listen that well. The communication barrier is big. So I guess what I'm saying is there's an avalanche of bad algorithms out there. And they aren't just ethical problems. They aren't just like discriminatory facial recognition. That is a problem, for sure. And for police to use facial recognition software in court cases when we don't even know if it works-- not OK. But there's also just much more mundane problems, which is that people think that they're asking and answering a certain question, and they're not. But they're trusting it, and they're putting their businesses on that line. DEE SMITH: And other people's money, who are the investors in their business. CATHY O’NEIL: Exactly. Exactly. I was talking to one of the big four auditing firms. And they said it to me like this, their job is to financially audit these huge companies, but that those huge companies are using algorithms to run their business. And nobody in the company, including the auditing firm, knows how to audit the algorithm. So they can't actually audit, financially audit, those companies anymore, because they are implicitly depending on the trust of the algorithms. DEE SMITH: So you do, actually, perform algorithmic audit? CATHY O’NEIL: Yes. DEE SMITH: Walk me through what's involved in that. CATHY O’NEIL: Well, I have a framework. I call it the ethical matrix, where I expand this question, the question we've been asking this whole time-- what does it mean for an algorithm to work? And of course, you always ask the data team first, does your algorithm work? And guess what the data team says? Sure. And then you ask them, how do you know? And they say, well, it's this accurate. They just tell you what they mean by that, and it's always accuracy, or efficiency, or profitability. It's one of those three things. And then you're like, OK, let's broaden that. And then you ask the question, for whom does this algorithm fail? And then suddenly, boom. OK, well, there's the data team. There's the PR team. What if this algorithm is pretty accurate on white men, but really inaccurate on black women? Have you tested that? No. What if The New York Times has a headline saying that is the case. Would that look good for your company? No. That's the kind of question I bring up. And then, of course, the second part is if there is a concern along those lines to develop a design-- design a test, or a battery of tests, which is what my expertise is, in designing data tests, just to answer that question. Does this work better on white men than black women? It's not actually that hard to design that test. Many of the tests I design are not that hard. Some of them are. Some of them are needing ground truth, like, how do you know you're measuring what you think you're measuring? Let's find this data in a different way and make sure it's consistent. That's kind of the hardest kind of most expensive type of data test that I'd run. I think most of my skill lies in this translation between all the different stakeholders, like what matters the most? What are we worried about? What could go wrong? And then designing the tests to see whether it is really going wrong. DEE SMITH: Do you find that companies that hire you to do that-- for the ethical side of things, are they favoring tall white men over short black women, or whatever, because that's where their PR risk is seen, their headline risk is seen to be, but they don't hire you as much for looking at the processes and the things that they're measuring that their business actually depends on? Or do they hire you for both? CATHY O’NEIL: If you don't mind, I'm going to go through three different types of customers. DEE SMITH: Please. CATHY O’NEIL: OK. So the first type of customer is like, we actually need to know whether this is working, or we know it's working, but we need a third party to trust us. And they're great clients. But their algorithms typically work. So it's not that much work for me. I mean, I verify that they work. DEE SMITH: Could they have gone in with that-- CATHY O’NEIL: They thought through it. They knew it could go wrong, and they thought through it. And usually, they have an investor who wants to know why am I investing in you. Show me it works. Or they have clients, or they have customers, or whatever. The public itself needs to trust them. So they need trust, and I help them gain that trust. The second type of client is like, we got in trouble, and we don't want to get in any more trouble. How do we avoid trouble? And so that's a lot more about rolling out announcements, and talking to compliance lawyers, and that kind of thing, worrying about class action lawsuits, that kind of thing. And then there's a third type of client which I want but I don't have and is the reason I started the company. And those are the people that are building these really massive important potential weapons of math destruction, which is to say scoring systems for health insurance, scoring systems for car insurance, for credit cards, for getting a job. I desperately want to look into those algorithms. The companies that build them and the companies that use them do not feel the heat yet from the regulators. Because these are all unregulated areas-- insurance, hiring. The regulators are so far behind in making sure that the processes in these large companies are compliant, they don't know how to get somebody in trouble for using an illegal algorithm. So the companies don't need me yet. I mean, at some point, I hope they will. At some point, I hope the regulators will write them a letter saying, hey, you're going to get fined millions of dollars, more than your cost of business, unless you prove that what you're doing is legally compliant. The day that letter goes out, my company is going to get some clients. And I'm looking forward to it. DEE SMITH: Well, and we're all going to be better off. CATHY O’NEIL: I hope so. DEE SMITH: So is this because the regulators really just don't even understand these issues? CATHY O’NEIL: Yes, that's right. For the most part. DEE SMITH: And it's not that they're understaffed. I'm sure they're understaffed. CATHY O’NEIL: They are understaffed. DEE SMITH: But they don't actually have this in their head yet. CATHY O’NEIL: I mean, that's one of the reasons I go around giving so many talks. It's an education initiative. I'd speak to the EEOC. I talked to the CFPB when my book came out. Now, that was before the election. I really, really wanted to work with the CFPB. And they were the most on the edge. They were the most sort of savvy regulator of all. But they have basically been closed down. They're not working on this anymore. So it's sort of like waiting for progress to be made. At the federal level, I'm not holding my breath right now. But at the state attorney general level, and sometimes at the city level, things are starting to move, and that's exciting. DEE SMITH: So do you think there's going to have to be some cataclysmic event, some company that really runs into a huge issue with one of these things, whether it has to do with an EEOC type thing or an actual transactional-- they made a business decision based on big data-- that turns out to be catastrophically wrong, that something like that is going to have to happen, or several somethings like that, for this to become in the public mind? CATHY O’NEIL: Right. And it's like-- OK. So on the one hand, yes. I do think that will have to happen. And on the other hand, I think it's already happening, but it's invisible. My assumption, because I know enough statistics to realize if we're propagating the past, we're probably propagating racism and sexism. Just if you fix hiring, just hiring, the ecosystem for matchmaking between people looking for jobs and people looking to hire is inherently propagating past. And I just, as I was coming here, saw-- somebody forwarded me a recent academic paper saying, who gets to see STEM jobs online? Guess what? White man. You know what I mean? I know that that's happening. The question is-- let me give you an analogy. When cars were invented, they weren't safe. We saw people dying on the side of the road. We saw more and more people dying as cars became more and more popular. At some point, we were like, hey, could we see fewer people dying? And then Ralph Nader came around and said, what about seat belts? What about airbags? We actually made it a thing. We turned it into a science. How do we-- and it wasn't perfect. We were more protecting men than women because of crash test dummies were men, not women-- whatever. The point is that we realized we needed to do something. Same with airplane safety. My issue right now with algorithmic harm is that I think it's happening. But we're not seeing the dead bodies by the side of the road. So when I have an opportunity-- and I often do, sometimes do-- to talk to policymakers about this stuff, or regulators, I often say, hey, you know what? Don't start with the rules. Start with the dead bodies. Ask the companies using these algorithms to measure their failures-- for black women, for black men, for white women, for white men. I want you to start-- whatever the categories are-- for people with mental health status, for people with veteran status. Look at the protected categories. Measure the failures for inside versus outside the protected categories. Just ask them for that data. They're not collecting that right now. Those are the dead bodies. I want to know are the dead bodies concentrated in these protected classes? My guess is absolutely. But we're not even seeing the dead bodies. So that's the first step. DEE SMITH: It's an anomalous thing to-- in my company, we do private intelligence, mostly in the investment world. And we look at what we try to avoid, or would document and remove from the equation, are false negatives and false positives. And that's really what you're CATHY O’NEIL: Start there. DEE SMITH: Yeah, that's exactly what it is. CATHY O’NEIL: Start there. DEE SMITH: And you know, it's stunning to me how many people-- and I'm sure this is true of what you're doing-- they get a system that they make themselves comfortable with. They rely on it, and they're highly resistant to change. Because not only does it mean that they have to go through all the things that change, but also, it's a psychological-- they're basically telling themselves that I didn't understand what I thought I did. They're so emotionally-- CATHY O’NEIL: It's cognitive dissonance. DEE SMITH: It's cognitive dissonance. CATHY O’NEIL: Let me just walk through that in a particular case with hiring algorithms. I run a company. It's so much work. So many people applied, like 10 people for each position, maybe 100 people for each position. I need a way to filter out these people and get only the qualified people. I hire some company to do that. It seems to work really well. And the people that come through the filter seem pretty qualified. And then I can winnow it down. And it's much, much more efficient than it used to be. Looks like it's working for me. Now, sometimes it doesn't work. I have a false positive. I hire someone who really didn't work out. But I don't really blame the filter. It's still helping me. The false positives I keep track of. But they don't kill me. What am I failing to keep track of? False negatives. Those are the people that should have been hired, they should've gotten through, but were rejected by this system I don't even think about that. Because it honestly doesn't affect my business. As long as I get enough people who I want, I don't care about the people I should have wanted. But those people care, right? Those people care. And then when confronted with that issue-- hey, what about your false negatives-- the people running the business are like, I'm a good person. That's the cognitive dissonance. Like, I'm just trying to make my business not spend all of its time and money hiring. And they don't think of it as their responsibility. And they won't. They won't until the regulators force them to. DEE SMITH: Right. And of course, the other thing is that the false negatives might hide some kind of a finding that maybe there's some class of employee that has certain attributes that you're not aware of-- they have degrees in music theory or whatever it is-- and they would be the best employees. They would have this-- but you'll never find them because they fit a certain pattern that the algorithm, the paradigm, filters out. CATHY O’NEIL: Yes, yes. DEE SMITH: I mean, this is a universal problem, obviously. But we've convinced ourselves with these algorithmic systems that we somehow have a handle on something, when in fact, they've made things worse. CATHY O’NEIL: They could very well have made things worse. And that's what I'm saying. We have no reason to think they haven't. We have no reason to think they're perfect. And once we admit their flaws, the list of possible flaws is long and growing. And we absolutely must start looking into this. Because I'll just finish by saying that all-- and I was a data scientist working in, you know, ad tech, and I realized very soon that not only was it similar to what my job had been working in the finance thing, where I'm predicting people instead of markets, not very hard to transfer-- but it is exactly the same techniques that I'm using that everyone else is using in data science. So it's not as if one algorithm is going to be wrong in one way, another algorithm is going to be wrong in a totally different way. No. The things that are easy to figure out with data are class, gender, race, age. And those are the things. And maybe different kinds of other kinds of protected class status. Those are the things that we need to worry about. Because if they're true for one of those hiring algorithms, probably all of them have similar types of problems. And if we ignore that, then we are just propagating the past into the future, which we could do a lot better. I mean, and that's the other thing. We could do a lot better. These algorithms aren't unsalvageable. We're just being super duper lazy. We could fix them. DEE SMITH: Well, I'm sure we will. And I'm sure you'll be one of the leading people doing it. CATHY O’NEIL: Thank you. DEE SMITH: Thank you so much for talking to me. CATHY O’NEIL: My pleasure, Dee.