The Hacker Mind Podcast: Disarming Document Threats
Phishing is everywhere. Who among us has not seen phish in their inbox? Aviv Grafi, from Votiro, gets into the weeds about how malicious documents are formed and how they might (despite good secure posture) still end up in your inbox or browser. He’s created a rather novel method to strip out the good content from the bad without affecting your overall productivity. And maybe, just maybe, stop phishing as a visible attack vector.
Vamosi: In March 2023, CISA documented a red team activity that simulated a malicious cyber attack against a large critical infrastructure organization with multiple sites. After gaining access and leveraging Active Directory data, the CISA team gained persistent access to a third host via spearphishing emails. From that host, the team moved laterally to a misconfigured server, from which they compromised the domain controller. They then used forged credentials to move to multiple hosts across different sites in the environment and eventually gained root access to all workstations connected to the organization’s mobile device management server. The team used this root access to move laterally to business sensitive-connected workstations.
The team first conducted open-source research to identify potential targets for spear phishing. Specifically, the team looked for email addresses as well as names that could be used to derive email addresses based on the team’s identification of the email naming scheme. The red team sent tailored spear phishing emails to seven targets using commercially available email platforms. The team used the logging and tracking features of one of the platforms to analyze the organization’s email filtering defenses and confirm the emails had reached the target’s inbox.
CISA notes that despite having a mature cyber posture, the organization did not detect the red team’s activity throughout the assessment, including when the team attempted to trigger a security response. Here the activity was to send an innocuous-looking meeting invite and get someone to click the request. In other cases the spear phishing might be an attached single document or spreadsheet.
The point is hackers no matter how mature your security posture might be, bad actors are embedding malware into various document types. So maybe we should stop looking for all the bad content and simply focus on the good. Files have formats. Perhaps there is a way to extract the good content and place it within a new, secure template for delivery to the recipient in millisecond, without interrupting your organization’s productivity.
In a moment we’ll here from some one who’s doing just that. Moving the good content to a seure template. I hope you’ll stick around.
[MUSIC]
Welcome to the Hacker Mind, an original podcast from ForAllSecure. It’s about challenging our expectations about the people who hack for a living. I’m Robert Vamosi and in this episode talking about CDR which is content disarmament reconstruction. It’s come along way from it’s humble beginnings neutralizing malicious documents by making them all PDFs. Now you do perform CDR without ruining productivity.
[MUSIC]
VAMOSI: So after 73 episodes I’ve successfully avoided talking about phishing directly. Part of me feels like phishing is low level attack vector and there are a bunch of security companies offering solutions or training, but the training as know from other episodes, training isn’t always effective because you have to buy even more training. Ugh. I just don’t want to get into that. So I was skeptical when I was pitched my next quest, He wanted to get into the weeds about how malicious documents might end up in your inbox or browser. And offer a new approach this to old problem.
GRAFI: So I'm Aviv Grafi. I'm the CTO and Founder of Votiro. Votiro helps organizations protect themselves from any kind of weaponized content, no matter where it comes from using a cloud based service that sanitizes and cleans every content so we can be safely consumed within every using every employee, no matter where it's on.
VAMOSI: Aviv served in the Israeli Army's 8200 intelligence unit and after as a pentester for many years.
GRAFI: Sure. So I'm based in Tel Aviv, where I actually am in the security industry and an area for the last probably more than 20 years since I was in, in high school. I start to be very interested in how things are really working and whether I can make them work. differently. So I started to do some security research and hacking when I was 1617. And when I was getting to around the age of 18, as in Israel, every boy and girl due to mandatory service for three plus years, so I was recruited by the intelligence forces. One of the units called at Android which is a unit that back then it was less known, but mainly focused on security operations. And I was doing 40 something years and mostly offensive and defensive security operations. And one of the things you learn as part of that experience is when you gather a group of 20 something years old, 20 years old, boys and girls and you tell them something, they're very naive, right? So go, go get this thing. And, you know, when we're getting older, we're getting more cynical. We're getting young or hesitant, but when you're 20 years old, you say, Okay, we will do that. And there is a magic happening there. And I think that's where I learned that no, there is nothing that is really possible because we're doing things they're just really, you know, sounds really impossible.
VAMOSI: So this idea of telling smart kids to get something-- that’s offensive security. That means they ‘re acting like a red team and actively trying to acquire some data from another source. Most of time we’re talking about defensive security solutions. Of course, they stop the threats from getting inside your organization. Offensive security, then, the opposite.This doesn’t get talked about much, so I asked Aviv to explain this further.
GRAFI: Sure. So, as some of you probably know, there are a lot of bad guys who are white hat or black hat that either tend to review or review security systems and try to have those differential information. And there's some on the other side. The radar teams that are in charge of you know, like in that game, defending those systems from being breached, and hacked by those guys. So the two roles that are being played in interesting the same thing was, you know, in units, so I was spending a lot of time on defending systems and networks and data. And some of the time I was doing the other role, which play that game of fetching information from other systems. So I think that's basically the defensive and offensive.
VAMOSI: So in the traditional Red Teams Blue Team world, red teams would be the offensive security. They hack into the organizations to test its defenses.
GRAFI: So Red Blue Teams are more kind of hacking with the license kind of thing. So I think that that's more, I would say, getting a target, but it's not just necessarily like hacking with the license and the access to the actual network. You need to do all the Recon need to do everything blindly so I think it's a kind of attempt is probably harder than Iraqi that getting some access or some info but yeah, in terms of that game was going the other way, probably close to retinal Blue team.
[MUSIC]
VAMOSI: By definition, a computer virus as opposed to a computer worm, needs a malformed document to spread. And in the early days, there were vulnerabilities in Microsoft Word, for example, that early viruses took advantage of. Viruses propagated only when a malicious document was shared from one PC to another. Consider the Michelangelo virus was spread on floppy disks. The ILOVEYOU virus was spread through email. Part of the problem was the email reader itself, in this case most likely Microsoft Outlook, allowed files marked .doc and .ppt to be opened without a lot of hassle. No need to disrupt productivity. Microsoft eventually say this abuse and started parsing the attachments more, prompting dialogue boxes to open files. Still email systems recognized certain extensions -- like .doc and .ppt -- and know what applications these were associated with. Yes, but these same system didn’t really look inside the documents, only the extensions. And that gets to rules within Windows. There’s also a rule in Windows that executes the exe wherever it pops up in the file name. So you could, for example, have IAMMALICIOUS.EXE.DOC getting permission to open up in Word, meanwhile it is executing IAMMALICOUS.EXE..
GRAFI: So if you're thinking about weaponized document a lot of first let's define what document is. I mean, we all know that we exchanging content every day. I mean, we've been sharing for example, that podcast, description or lineup or whatever we want to change. I just received a PowerPoint presentation from my one of my colleagues here, and I'm sharing, for example, a doctor's report with my insurance company because I want to file a claim because I want to get some claim for something and maybe I will send some birthday card PDF guide to my mum saying happy birthday. So we all exchanging a lot of content in a lot of forms. Mainly it will be you know, today like PDFs, Word documents, PowerPoints, Excel spreadsheets, but even images or archives, you have to document and those documents have a very specific structure. It's usually described by the vendor that actually created that document. It can be Adobe that manage the PDF, format or Microsoft that defining the office specification. And there's one thing that is really important to know about documents, that the reader the application that read those documents is actually parsing them and as part of that parsing process, it's actually a logic of my Microsoft Word that reads that Doc, or docx, is actually running some code based on what that structure is composed of.
MARK 6 VAMOSI: This is important. There’s the file which may be a malicious exploit within it and there’s the reader which may also have a vulnerability. The reader is expecting files of a certain structure which it may or may not get. But also, there’s the reader, and guess what? Maybe the reader has a vulnerability too?
GRAFI: And if there is a bug or some issue in the reader that we document, and I can alter that specification that were documents so we will do something that the original developer of the word microsoft word they didn't think about. And this might result in vulnerability or exploitation of the vulnerability that will eventually execute code.
VAMOSI: All this might sound like ancient history. I mean formats like RTF, Microsoft Word, Powerpoint and Excel have been around for decades, and mitigations do exist to prevent the wildfires we say in the early 2000s. And yet these file formats are still being exploited today.
GRAFI: And there was one recent example. It was just you know, on the last few days, we had two new CVE vulnerabilities released by Microsoft, one of them was in RTF document. Now who is using RTF document these days? That the reality that only the bad guys, but what are they doing that because Microsoft Word supports RTF as a backward compatibility for 20 or 30 years back when we were using that Rich Text Format, documents, so RTF and in general, a lot of outdated or not that common format, a great playground for the better. So this is for example, one of the reasons why we still finding those kinds of vulnerabilities. So taking that what was just released last week by Microsoft, as a patch for the vulnerability taking RTF documents and in that telega this resume or even as a birthday card, or even, you know, as you know, there was some banks like we had some issues from last week, right? And if you want we actually change our account. So this is kind of instruction, how to update your clients and sending that as an RTF. And if you'd send that to an accounting or the accounting team, maybe my accounting team, yes, we are working with some of those banks, and we want to know whether we should comply with something so they will open. So I think weaponize document. It's not just the RTF but every kind of content that is followed by strict specification. That is parsed by an application can be weaponized, can be used to execute attacks.
VAMOSI: So back to my example, not only could you append .doc, you could just rename the .zip and get it to run. So IAMMALICIOUS.EXE.ZIP could also work.
GRAFI: Yeah, that's right. I mean, you can change the extension but even if you'd call instead of that, dot rtf or dot, Doc, still it will be open by the same application. And of course, you can change the extension. You can change the file name, you can change the icons. But practically under the hood, that's the exact same specification which might be malformed or maybe specially crafted to execute that code. That can be resolved in remote code execution. So extension is just a nice to have so it can be recognized that easily. But basically, the threat underneath is still there.
VAMOSI: So where I'm getting to that is if I received an email and I saw it was a dot doc I might be like, well, that's cool. I have word I can open this but under the hood, as pointed out, it's an Rich Text Format or some other format. Are there any other ways that someone could tell that a document is malicious without using extra technology?
GRAFI: So that's a tough question. I think that it's easy for the average user to think that certain fine is from from that format, but the reality is a different format. So it's really hard to do that. I would say that probably saving that document and making sure that there are no trailing extensions, maybe that's something that can be done. But I wouldn't be expecting that from a lot of employees because we're asking our colleagues to do their job. And not trying to guess whether something is trying to hack them. And I think one of the approaches I love to talk about is how can we turn security into business enabling security and not restrictive security? Because if you think about phishing awareness campaigns that I'm sure that I need to do every year, I'm sure that you do. And all of the audits need to go through those awareness campaigns. Why are we doing that? Because technology has failed to protect us from those kinds. of phishing, scams. And we tried to move that responsibility to the poor employees. And we tell them you now would spot the phishing and to be honest, even if even a day after a successful phishing campaign that maybe you know we result in that great score for my department. The day after that. I will send everyone here. Hey, guys, this is the new city plan starting from tomorrow. I'm sure that everyone want to know where they're going to sit. So they will open it. And if someone would send me Hey, there was a problem wiring your paycheck. Please fill in the attached form. I would open the dashboard because I want my paycheck. Right. So I think that's the kind of fundamental problem when the industry kind of acknowledges that the technology has failed. So we're trying to use humans or employees to do that job. So this is a major problem. And I think there are solutions for that, that we need to be open minded.
[MUSIC]
VAMOSI: So we talked about word documents and powerpoints. Let's talk about another cxommon document format. PDFs. They were designed by Adobe to be secure because they are locked. Except PDFs aren't more secure. There are vulnerabilities either in the file or in the reader. There are various ways to defeat PDFs.
GRAFI: Yeah, that's correct. I think PDF is one of the most complex file format that has been used in business. And the reason for that is just, you know, been developed for many, many years. With tons of extensions. You can have 3d objects within PDFs. You can embed multimedia within PDFs. You can build books with PDFs you can have like billboards, signs of advertisement using PDFs. You can do almost anything you want to do with the PDF. It's crazy, and that's why it's very highly complex. And where the format is complex, that's why the bad guys are having a lot of fun.
VAMOSI: So instead of being secure, PDFs have emerged in phishing campaigns. Perhaps because people mistakenly believe these are safe documents to open.
GRAFI: So yeah, yeah, I think they I think we have seen that. You know, a lot of times where PDFs are they use that as an attack vector. I must say that a lot of the vulnerabilities in the last few years, where I mean, the vendors did a great job of sandboxing the PDF features. So we're seeing less successful attacks, but we will still see, you know, highly sophisticated attacks, leveraging PDF I can tell you for example, we used to have PDFs that opening new processes PDFs that spinning up multimedia or 3d objects that really you know, steps on something like an image embedded image and do something very sophisticated we talked before. So PDFs is a really cool format to play with for the bad guys. And one of the things that are really, really interesting there is a question of, okay, so why don't we? Why can we just know remove all that crazy stuff from that PDF? And the answer is that we just can't because the vendors or the the application, the readers, the PDF readers manufacturers, they don't don't want to you know, to close their market or reduce their addressable market, if there will be removing features they might not be used in some organizations still need them. So there is some tension because the application developers want to catch all possible markets but they're not really know about the problem.
VAMOSI: Aviv alluded to images. There’s this technique called steganography, which means to hide in plain site. The image file formats are perfect for this in that the file specs have a lot of unused space allotted. In 2021, someone included different files within image uploaded to twitter. Researcher David Buchanan created a tool for generating tweetable-polyglot-png files on GitHub. By downloading the image and then changing its extension from .png to .zip, you could obtain different information. By downloading another image, you could change the file format to .mp3 and hear Rick Astley’s Never Gonna Give You Up. One could put either text messages or actual files and embed them in an image. That image could be posted to social media so that if you knew to look you could receive the message. Or you could hope that people like the image enough to download it and now infect their machine. I'm curious, is steganography actually used for malicious purposes?
GRAFI: So, actually, I think it's been known for probably 20 years that you can maybe probably more I mean, since the first day of encryption maybe probably 100 years ago, but probably hiding data or text within images. And this is something that's been done for many, many years. The thing is that it's not just in us who leak information, embedded in images, but now it's being used also, to embed weaponized or malicious code within images. And one of the ideas and how we see it's being used by the bad guys, is it's usually a big supplement supplement. With something else. So it can be a script that is being sent. But the actual malicious content is not in that script, it is just renaming or extracting the malicious part of an image that just sits as an attachment next to that script. So if you look at that script, it does not connect to the network. It does not, it does not do anything malicious. It just, you know, open an image and save an image. And that's it. So I think it's usually being done to hide some more sophisticated, I would say weaponized or malicious code. And we see that more and more, and that's one use of steganography. But the other way to do that is really exploit the vulnerability in the imagery that we still see from time to time, gifts or JPG, and some other formats that they have an inability, let's say, in the iPhone reader, you might be exploiting a vulnerability in the iMessage, sending an image and that image would contain a malicious code that's embedded in it. So leveraging one one of the vulnerabilities in gift this is one and the rest of the code is been embedded as fixes within that image. So we see that as well.
VAMOSI: So it kind of like this one two punch. So you could get a document that has an image in it. And the document has one vulnerability but the image might have a script that helps execute that vulnerability and or others.
GRAFI: That's correct. You know, the bad guys are trying to avoid that kind of detection. If they were to just you know, have just one trick. They may be caught and traditional defense probably would stop that. But they're faster and smarter than us so that you know that that's the game. So we then need to hide those signature based attacks within some other vehicles. It can be stored elsewhere. For example, we're documenting something malicious and it may fetch the rest or the second stage of payload from the internet. So the lot of techniques to avoid that kind of detection, I think about probably the more than a couple of hours to describe them. And I think that's one of the reasons why a lot of the traditional detection solutions like antivirus, anti malware, sandbox, even the next gen AV or EDR. They all fail in detecting the new and shiny attacks and techniques, because they're relying on history they're relying on what is known, and they can actually cannot really expect the future or predict the future.
[MUSIC]
VAMOSI: there's also a degree of spoofing. You said that you would send out the new seating chart,.
GRAFI: Yeah, that's correct. And I get plenty of emails saying hey, please buy some gift gift cards for me and I have a lot of things to from who I thought it's a colleague or manager or even the CEO. So yeah, this is something that is a very common thing they do here is instead of trying to spot the phishing is to really generate the content which is safe. As in the article, I'll just put out that one of the countries is called the CDR, which is content disarmament reconstruction, which means let's turn the problem on its head and instead of trying to look for the bad stuff in the document, we know that the end user is interested in the content in the actually charts, text, bookmarks, images, and by generating a safe version of that document. By moving the content to a brand new and safe template of the exact same format. We actually generate headlines that look and feel exactly the same. But without that malicious part in it, because we're not looking for any malicious part. It's only the safe content. So CDR is one of the technologies out there. We're not the only company that can provide those kinds of solutions. But this is one example of how we can really think differently about security and not try to block stuff and not try to let our employees know, the tech deficient or spot a phishing game, but provides a more enabling business enabling security.
VAMOSI: So Aviv had this idea of a template that extracts the good content from the bad. Could you tell me a little bit more about that?
GRAFI: Sure. So the idea behind cones and disarming reconstruction, or or CDR is that you actually want to turn the problem on its head and not necessarily try to detect that part of the document or maybe say, yeah, there's a signature in that document that I think it's malicious. So the idea is not to try to fetch the best stuff, but take the content itself, and the content can be everything that is visible. Let's say if I have a resume, so I'm interested in the text, I'm interested in there maybe the applicant, image or picture, interested in the paragraphs and if I'm taking that content, and placing that on a fresh Word document or PDF document and placing that on the exact same order with the exact same colors. And I can deliver that replica within milliseconds, because that process is deterministic. There's no need to run that document. Will a sandbox, wait for something to happen. It is just replicating a document.
MARK 3- VAMOSI: Wow. That’s simple .. right? Wrong.
GRAFI: Obviously, there are some challenges we went through. I think that's the first generation now of cdr technology. The first generation was just very naive. It was just taking that document, Word documents, saving that as PDF and delivering PDF. Some vendors in the market are doing that. And that was result in very secure kind of document and there's nothing bad in it, but it's it's very usable. Because if you're expecting a Word document or getting a PDF document, I cannot really edit that. So that usually doesn't work.
MARK 4- VAMOSI: That doesn’t really solve the problem.
GRAFI: So that's where the second generation or the second or the CDR level two came into play, where we said back then, let's drop anything from the document that might be dangerous. So it can be overly objects in Office documents. It can be macro VBA macros, right that the bad guys love to exploit because it's a piece of software that runs in in every document and let's deliver that it was okay. But eventually when you get to a lot of organizations that say, Look, I know that you might, maybe you find that I can drop the VBA macros, but my finance department is still using that we're getting from Goldman Sachs or from our partners, still getting Excel spreadsheets with macros, because that's how we build our automation 20 years ago and still working.
VAMOSI: Why are we still using RTF as a file format? And why are we still using .doc as opposed to .docx? Shouldn’t we depreciate these?
GRAFI: So that would probably happen. The only problem is that a lot of organizations, probably the enterprise, the huge enterprises, and governments still rely on those formats. Because if you think about the common format, the most common format that I can exchange documents with you are probably office and PDF. And for example, one of the questions being asked, okay, don't just think that we're all going to move to Google Docs or one of the other alternatives at some point. Certainly, yes, maybe if it's a consumer, that will be fine. But you know, with the law firm I'm exchanging documents with Word documents. I haven't worked with any law firm that sent me a Google Doc, because they want to have revision control. They want to have everything like, documented and sealed. So I think it will still take some time and if you'd ask why RTF is still supported by Microsoft Word. Why can I just remove that? So I think that's the same answer. Because there if you think about a lot of document generation software, if you think about insurance companies that they have a system that bill 25 years ago, that generates every midnight generates a report, it might be generating an RTF report, and maybe that will be sent for for the clients or maybe generally, so they're still using that and and unfortunately, I mean, they cannot really redesign it or maybe they will be able to redesign it. But it will take some more time. So backward compatibility, it was always a problem. But it was always a requirement that there is legacy enterprise organizations out there.
MARK 5- VAMOSI: Again not great. If the first generation made the documents unusable by changing them to a secure PDF, not we’re stripping out the macros and such without recognizing that some legacy system still depend on those.
GRAFI: So we need that. So that's where the CDR, the third generation of CEO or CEO level three can play where we understand that we have to preserve all features within those documents. So taking a Word document, including everything like that track changes. Only objects do that recursively for even extracting the macros and we develop machine learning based engine that can detect benign macros because we know how benign macros looks like right? Because we know that macros usually in Excel spreadsheets, they're manipulating cells, they change formulas, they may be fetching something, but it's basically around the same area, not necessarily spinning up new processes and deleting files from harddrive right. So we do how to do that. We don't have to do that. And we also integrated that into the third generation of cdr, which is called positive selection as we select the good parts of those documents. And then we actually generate the exact same format with the exact same look and feel. In fact, the users don't even know if we were there. So they're just opening the document, whether they come in by email, whether coming through slack, or Dropbox that was on the web, it doesn't really matter whether with fonts or telephone. We have our connectors to do all the process. And the great part with those kinds of technologies. That is really fast, as opposed to sandbox, if you remember those days when we were spinning up a lot of documents in Sandbox, we're waiting for something to happen, and if nothing happened for a few minutes, okay, so maybe that's a bit I find. So this car process takes milliseconds as opposed to minutes. So that's that's the point. We're taking the business look, you're one you're saving time. The second thing, you know, sending a lot of frustration, because you're telling the employees you can now open any document without a device like this is a significant value of integrating technology like Cydia.
[MUSIC]
VAMOSI: Aviv’s does pen testing.
GRAFI: so I was actually when I launched Votiro, it wasn't really, you know, it was a kind of a different business. I wanted to do something, it was more services and auditing and pentesting. But what I was doing for the first two years, and it was you know, I was traveling around the world and reviewing security configurations and policies and interviewing IT staff and I was demonstrating to my client, what are the weak points in his infrastructure, and usually I was demonstrating how I could hack. And so I would say that this was the first two years probably the company and since then I found that there was one technique that actually was working for me 100% of the time, which mainly in involving taking a weaponized document, which was, you know, renaming is, for example, a resume sending data to the recruiting team of my client, and say, hey, I want to apply to a position. I know Rob, and please let me know I'll be happy to provide a reference. So please call me. And on the other side, there's a guy lady that needs to screen hundreds of resumes a week, though, to do the job. And they were just opening that witness document. And it was just working 100% of the time and that's where I understood that. There's some tension between productivity and security. And that's the point where I came up with the idea which what is today called Votiro
VAMOSI: As a pen tester, Aviv has stories.
GRAFI: So I think there was one cool technique that I want to share with the audience. One of Votiro’s customers, it's a large insurance company. They actually had worked with a law firm. And that insurance company is about your customers. So they will say okay, we're protected. And that law firm was their partner. They actually got hacked, someone hacked their offices, if I was one of the mailboxes in their offices, by using phishing credentials. You know, the, the usual that those bad guys did something very, very clever. They actually logged in into the hacked mailbox, and they saw that one of those email threats was with the insurance company. That happened to be our customer. And by just replying on that thread, because they saw okay, they were changing draft of the contract, the replying and say, Hey, I've attached the latest version of our contract, and it's in the attach file and by the way, the password for that that zips is that, that that's a password. And the interesting part here is that that employee, that insurance company received that email. And if you'd ask them the usual question, do you know the Senator would say yes, of course. I know the Senator. I'm talking with her every day. Have you expected a document or something from it? Yes, if we're expecting a document, we're exchanging those contract drafts for the last two weeks. So, so even the fishing awareness campaigns, or training, they're not really helpful. But the interesting part is that those bad guys they're encrypted, that malicious Word document in the zip file. And by encrypting a Word document within a zip file, it's actually they bypass all traditional security solutions because they cannot really scan or do anything with the password protected document archive. So eventually, it landed in his mailbox, and he opened that zip file with a task. Now, what luckily, we introduced like a few months before that when you do some new process, we detect those password protected documents, and we tell them telling the end user Hey, there was a password protected document that you're expecting right? If you know the password type it here, so it's by typing the document password. we decrypt the password we sanitize or we arm that document and then we encrypt with the same password, deleting that to his mailbox. So luckily when he opened that document it was decrypted with the same password and the document was safe. But I think this actually demonstrates the level of sophistication and they're so determined to really launch a successful attack. If you're doing stuff that is really, really impressive. And that's why I believe that we all need to be minded for that. And we all need to think about how we can really employ more proactive and business analytic solutions. Because we cannot really, you know, throw everything on our employees and tell them you will start efficiently, right?
Add Mayhem to Your DevSecOps for Free.
Get a full-featured 30 day free trial.