We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Fighting Spam at the Frontline

00:00

Formal Metadata

Title
Fighting Spam at the Frontline
Subtitle
Using DNS, Log Files and Other Tools in the Fight Against Spam
Title of Series
Number of Parts
45
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
After more than 20 years of fighting, the spam problem isn't getting better. Spam has system costs, people costs, and organizational costs. The costs go up the further along the delivery path it progresses. We can't prevent spammers from spamming, but we can prevent much of it from entering our mail handler. Fighting spam at the frontline (firewall and MTA) is the earliest and cheapest place we can wage the war. Tools and strategies like greylisting (along with whitelisting and blacklisting), and tar-pitting have their place, but are we using them effectively? Is there more we can do? In this talk we'll look at the various strategies we can take to improve our ability to block spam at the MTA without blocking or delaying (or delaying for long) legitimate senders. One of the biggest complaints about greylisting and blocking is impact on legitimate mail. For low-traffic email domains delayed delivery and the odd-lost email might be acceptable. For higher-traffic domains, or those where timely delivery is critical, effective blocking requires a more active, but automatable, approach. In this talk we'll look at the current state of ip-x-listing (whitelisting, greylisting, and blacklisting), additional tools and strategies we can use to improve the accuracy and effectiveness of our lists, while ensuring timely delivery of email from legitimate senders. We'll also discuss strategies for keeping groups of mail servers in sync with the latests lists. Some of the tools and techniques we'll look at: MTA-specific features like postscreen Using SPF records to whitelist well-known senders Using the mail logs to whitelist outbound recipient domains Integrating feedback from SpamAssassin Using log files to identify bad actors Effectiveness of third-party lists
14
Thumbnail
53:06
40
Thumbnail
53:12
Direct numerical simulationComputer fileLine (geometry)Address spaceDirect numerical simulationComputer fileResultantRight angleEmailLoginDomain nameXML
SoftwareSoftware developerOpen setMail ServerEmailServer (computing)Real numberBlock (periodic table)Content (media)Mathematical analysisBitElectronic mailing listEmailIntegrated development environmentComa BerenicesOpen setMultiplication signServer (computing)Term (mathematics)Scaling (geometry)GoogolProcess (computing)Mathematical analysisException handlingFilter <Stochastik>Maxima and minimaOrder (biology)Software developerClient (computing)Intelligent NetworkHand fanReal numberSoftwareRevision controlConnected spacePlug-in (computing)Computer animation
System programmingEmailWebsiteNewsletterBitEmailKey (cryptography)IP addressCategory of beingWebsiteSystem administratorPhysical systemAddress spaceLink (knot theory)Reverse engineeringNewsletterUniform resource locatorMultiplication signNatural numberProper mapFamilyNormal (geometry)Process (computing)WordServer (computing)Canonical ensembleDomain nameComputer animation
Latent heatBlock (periodic table)Block (periodic table)Electronic mailing listEmailLeakIP addressTerm (mathematics)Latent heatComputer animation
EmailElectronic mailing listRule of inferenceEmailRight angleReading (process)InternetworkingFigurate numberBitRoundness (object)FrequencyQuicksortMultiplication signBlock (periodic table)Electronic mailing listReal-time operating systemPosition operatorDependent and independent variablesOrder (biology)Traffic reportingComputer animation
Electronic mailing listTheoryScalabilityRule of inferenceAddress spaceOpen setComputing platformPermanentGoogolChemical affinityEmailOnline helpElectronic mailing listTheoryMathematicsComputing platformConfiguration spaceNumberMultiplication signDomain nameIP addressServer (computing)Connected spaceDifferent (Kate Ryan album)Form (programming)40 (number)Right angleEntire functionRange (statistics)SpacetimeState observerMoment (mathematics)Address spaceArithmetic meanGoogolComputer animation
MathematicsDomain nameAddress spaceEmailSoftware frameworkDirect numerical simulationRange (statistics)Sheaf (mathematics)Row (database)Software frameworkForm (programming)Validity (statistics)Numeral (linguistics)Inclusion mapEmailElectronic mailing listInternet service providerServer (computing)Moment (mathematics)MIDIIP addressDomain nameMereologyWeightRange (statistics)Point (geometry)Canonical ensembleComputer animation
EmailDomain nameProcess (computing)EmailRow (database)Flow separationGoodness of fitIP addressBuildingDomain nameMoment (mathematics)Electronic mailing listCanonical ensembleDivergenceInjektivitätQuicksortCross-correlationSubject indexingComputer animation
Computer fileEmailTable (information)Domain nameElectronic mailing listGoogolAddress spaceProjective planeBitRow (database)CAN busDomain nameTable (information)MereologyServer (computing)Goodness of fitMotion captureEmailElectronic mailing listRewritingScripting languageProcess (computing)1 (number)Function (mathematics)Multiplication signGroup actionQuicksortGastropod shellFrequencyComputer filePoint (geometry)Network topologyAddress spaceRecursionGoogolComa BerenicesReading (process)Level (video gaming)IP addressXMLComputer animation
CollaborationismCore dumpRevision controlLink (knot theory)Slide ruleGoodness of fitRow (database)Resolvent formalismElectronic mailing listCodeRevision controlCodeIP addressLink (knot theory)BitOpen setSlide ruleStandard deviationDomain nameComputer animation
GoogolIP addressFlagRow (database)Dot productFunction (mathematics)Domain nameRange (statistics)Forcing (mathematics)Address spaceMoment (mathematics)Computer animation
Electronic mailing listServer (computing)Firewall (computing)Configuration spaceBuildingSoftware testingAutomationBlogRule of inferenceEmailMereologyInformationSet (mathematics)Firewall (computing)Similarity (geometry)View (database)Goodness of fitBitLink (knot theory)MultilaterationServer (computing)IP addressAreaDomain nameWeightMultiplication signRule of inferenceFamilyAxiom of choiceMoment (mathematics)Computer animation
Domain nameSource codeEmailScripting languageProjective planeElectronic mailing listWebsiteLink (knot theory)BitMereologyDomain nameGoodness of fitGroup actionSinc functionReverse engineeringIP addressScripting languageAddress spaceEmailXMLComputer animation
Rule of inferenceSource codeAddress spaceEmailRule of inferenceOpen setGroup actionAddress spaceNetwork topologyEmailElectronic mailing listGoogolDefault (computer science)Direct numerical simulationXMLComputer animation
Distribution (mathematics)Address spaceoutputSource codeRing (mathematics)Data miningComputer fileTheoryDemonRootBlogScripting languageUniform resource locatorFirewall (computing)Rule of inferenceError messageRouter (computing)IP addressData miningTheoryLoginGoodness of fitVirtual machineError messageServer (computing)BitEmailData loggerGoogolDemonLatent heatTable (information)Rule of inferenceQuicksortMereologyForcing (mathematics)Scripting languagePort scannerFirewall (computing)Regular graphPhysical lawElectronic mailing listWeb crawlerRight angleSystem administratorPoint (geometry)WebsiteArithmetic meanAuthenticationWordComputer animation
BlogFirewall (computing)LoginRootScripting languageGroup actionIP addressInstance (computer science)LoginView (database)RootFrequencyMassGroup actionDirection (geometry)EmailServer (computing)NumberConfiguration spaceRippingField (computer science)Revision controlComputer animation
BlogEmailRootLoginFirewall (computing)Scripting languageGroup actionRouter (computing)Right angleSelf-organizationMultiplication signVirtual machineMathematical optimizationServer (computing)Point (geometry)TheoryRoutingPort scannerComputer animation
BlogEmailPrice indexBlogElectronic mailing listTable (information)Multiplication signLoginEmailProjective planeIP addressSoftware testingRoutingCivil engineeringTraffic reportingKey (cryptography)BitRight angleServer (computing)Computer animation
AutomationScripting languageMoving averageBasis <Mathematik>Multiplication signTimestampScripting languageIP addressLoginTable (information)Process (computing)Scaling (geometry)Computer animation
AutomationScripting languageRankingRight angleScripting languageMultiplication signTimestampXMLComputer animation
Modal logicServer (computing)Electronic mailing listDistribution (mathematics)Multiplication signEmailServer (computing)NeuroinformatikClient (computing)Observational studySoftwarePoint (geometry)Electronic mailing listInformationMechanism designInternetworkingComputer animation
HypercubeGoogolDomain nameEmailInstance (computer science)LoginRootMail ServerRule of inferenceView (database)MereologyOpen setServer (computing)Sinc functionObject (grammar)EmailComputer animation
Open setComputing platformTable (information)Firewall (computing)Rule of inferenceMulti-agent systemModule (mathematics)Gastropod shellCodeConnected spaceRecursionMoment (mathematics)Exception handlingModule (mathematics)Revision controlEmailSinc functionElectronic mailing listPlug-in (computing)Open setSource codeFamilyDegree (graph theory)Computer animation
Computer configurationLatent heatProcess (computing)Client (computing)Server (computing)WindowSoftware frameworkRight angleConnected spaceDean numberDirect numerical simulationIntegrated development environmentReal numberComputer configurationRevision controlLanding pageBookmark (World Wide Web)Different (Kate Ryan album)Green's functionArmElectronic mailing listHeuristicEmailFamilyComputer fileConnectivity (graph theory)FreewareCASE <Informatik>Uniform resource locatorMultitier architectureWebsiteGoodness of fitRewritingComputer animation
Physical systemDomain nameKey (cryptography)EmailAuthenticationMessage passingTraffic reportingFeedbackMechanism designServer (computing)EmailIP addressComa BerenicesBitDomain nameMoving averageMereologyPoint (geometry)Online helpRow (database)Public-key cryptographyGoodness of fitInternet service providerElectronic signatureForcing (mathematics)Degree (graph theory)Right angleMagnetic-core memoryGoogolKey (cryptography)Direct numerical simulationLatent heatComputer animation
HypercubeEmailDomain nameEmailDomain nameAddress spaceElectronic mailing listMessage passingOpen setRootInternetworkingSlide ruleGreatest elementSystem administratorGroup actionRoutingGoogolAnalytic setComputer animation
Software testingCodeEmailHypercubeDomain nameEmailMaxima and minimaSlide ruleSoftwareScripting languageReading (process)Address spaceRule of inferenceIP addressProcess (computing)Data miningMoving averageRight angleComputer animation
Software testingCodeEmailAnalytic setEmailSoftwarePoint (geometry)Goodness of fitDefault (computer science)Scripting languageServer (computing)Block (periodic table)Forcing (mathematics)Port scanner2 (number)Centralizer and normalizerIn-System-ProgrammierungBitMereologyRight angleConnected spaceNumberCountingDomain nameMaxima and minimaSource codeMusical ensembleSpeech synthesisPhysical systemGame theoryWordPhysical lawPlanningTraffic reportingDaylight saving timeView (database)PhysicalismComputer animation
CodeLink (knot theory)Slide ruleEmailSoftware developerOpen setPoint (geometry)Link (knot theory)Slide ruleComa BerenicesProduct (business)Projective planeScripting languageTwitterBlogSoftware developerCore dumpOpen setVideo gameFirewall (computing)Goodness of fitXMLComputer animation
Block (periodic table)Right angleComputer cluster1 (number)Limit (category theory)CASE <Informatik>PlanningClient (computing)EmailScripting languageIn-System-ProgrammierungTraffic reportingAddress spaceComputer animation
Presentation of a groupBlogTouchscreenDerivation (linguistics)PasswordBitBlogEmailServer (computing)Real numberLink (knot theory)Multiplication signRight angleSoftware repositoryWebsiteMoving averageSet (mathematics)LengthSystem callComputer animation
Presentation of a groupBlogScripting languageSoftware repositoryRight angleXMLComputer animation
Transcript: English(auto-generated)
First you all will notice, you can't really talk about fighting spam and not publish your own email address on everything you do. You're either confident or you're not confident
in your spam fighting techniques. Well, as we'll talk about later, I've owned this domain and that email address for 23 years. So OK.
Well, it's 1.30, so we'll get going. So this is fighting spam at the front line using DNS, log files, and other tools in the fight against spam. I'm Aaron Poffenberger. Real quick note of thanks, particularly to BSDCan, the sponsors, and the volunteers
for making this possible. Honestly, I think BSDCan is one of the better conferences, one of the more better return on investment. For a long time, I wouldn't go to conferences because I always feel like I didn't get near as much out of it as I put into in terms of time and money. But BSDCan, I think, does an exceptional job. So thanks to everybody who pitches
in to make this possible. Quick note about me, I'm a software developer. I've been an OpenBSD user since about 3.2-ish. I don't remember which exact version I first installed. And I've been running my own mail server for about 20 years. For a long time, I used Sendmail because that's what was in base.
And then when OpenBSD base moved to OpenSMTPD, I moved there as well. So I've been dealing with spam for a very long time. And the origin of this talk comes from that experience. So running my own mail servers, running mail servers for companies I've worked at. I've been receiving spam for as long as I can remember.
The first couple of spams I got, I was rather naive about it. I would respond to people very dutifully. I think you sent your email to the wrong person. Didn't take me long to figure out that that was a dumb move. Probably like everybody, I've been through all the client side filtering tools.
I've used Bayesian. I've used whatever plug-ins are available. I've tried server-side spam filtering, spam assassin, usually in a larger framework like on the VistD. Used gray listing, a lot of promise, a little bit of pain involved with gray listing. Experience working with PF. And then for the past two years, not this year,
but the previous two years, I ran a tutorial on using OpenSMTPD to run, typically, a SOHO mail environment. I will be honest up front. If you are the chief spam person for Outlook.com or Google, I probably don't have much to teach you here. You all have your own scale problems to deal with.
But for a lot of us working at the SOHO level, and SOHO doesn't necessarily have to mean five users. I'm talking not the Outlooks and the Googles of the world. So the goal, I didn't set out with this goal. These goals are kind of like backported to what I've been able to accomplish. So I want to be able to block spam before it gets into the MTA.
Analysis is fantastic, but as everybody knows, analysis, it's all heuristics. Does this seem like spam? Maybe, maybe not. Spammers do a fantastic job of weaving in valid data that'll throw off your Bayesian filters.
So I wanted to avoid content analysis. I want to allow my legitimate senders to connect. I want to prevent the illegitimate senders from connecting. I want as little delay as possible, minimum resource usage, and as automatable as possible. Kind of a tall order, but I think in the large, I've been able to accomplish those goals.
So let's start off with a few definitions. So what is spam? This is a fairly typical definition you might see, and I've added a few bits to it. I don't think spam is solely commercial email, although it tends to go that way. It tends to be of an advertising nature. One of the key features of spam, though,
is you can't unsubscribe. And it's usually coming from non-canonical senders. You can't do a reverse lookup on these, find out what was the domain, find out who their mail admin is. Mostly compromised systems. That isn't to say, though, that there aren't spammers who get servers at legitimate locations
and get proper IP addresses. But a lot of spam is just coming from hither and yon. But what spam is not is what I think would fall in that broad category fairly legitimate. If you signed up for the newsletter in 97 and you're still getting it, that's not spam. Go push the unsubscribe button or send an email
to Majordomo and get off. If you signed up for a website and it happened to be opt out and you forgot to click the checkbox to opt out, I don't consider that spam either. And there's an easy way to get out of receiving those. And there's not much I can do about your crazy Uncle Joe who sends you political email all the time.
Joe is still your uncle. He knows your email address. You probably do converse about normal things like family get-togethers. But he does go onto websites and occasionally clicks forward this link to and then sends it to you. In my book, that's still not spam. So in other words, just because you find it annoying doesn't necessarily make it spam. It might be junk, but that's a slightly different category.
So let's talk about x-listing, just to kind of get our terms down. So blacklisting, just simply it's just blocked delivery from specific IPs. Graylisting is temporarily failing to deliver by usually returning like a 451 or something
until some criteria is met. The typical criteria is I block you from sending email with a 451. And I expect you to call back with, wait at least three, five, 30 minutes, whatever your criteria are. And then I will add you to my whitelist.
And then the whitelist is always allow this IP address through. It doesn't necessarily have to be a forever whitelist. Whitelists can be ephemeral, rolling 24 hours, but they can also be permanent. OK, so let's talk about blacklisting a little bit more. I think this was one of the first simple things a lot of us tried to do was figure out.
I used to block entire regions. There was a period of time where my rules blocked anything from China. Like I don't know anybody in China. I'm not going to get an email from China. I had Korea in that list. It's simplistic, but that's not really how it all started. It used to be people ran open relays, right?
That was the kinder, gentler internet. Anybody could connect. Anybody could send. We all sent email all around. That didn't last long, of course. We saw the rise of the real-time blacklists, Spamhouse and whatnot. Of course, the problem with blacklists are the false positives. Did anybody ever get on a blacklist? You go read the FAQ.
And I'm not kidding you. This is pretty close to the kind of response you'd get. How did you get my IP? Somebody reported you as a spammer. How do I get off? Go kill yourself. Well, that's not very helpful. I remember reading lots of complaints years ago where people would get on a list and couldn't figure out how to get off. I think a lot of the real-time blacklists
have gotten a little bit friendlier, or they auto expire after a certain period of time. But it was a real problem for a while. Of course, if you're going to have blacklists, they pretty soon beget whitelists because you want to make sure that the people that should be sending to you can get through. It's a great idea in theory. The problem is how do you keep it up-to-date?
People do change IP... Well, domains do change IP addresses from time to time. And it works fairly well for one domain, two domains, maybe ten domains, but if there's a hundred, a thousand, how are you going to keep this up-to-date manually? Greylisting, I think, was one of the first really useful, fairly automatable ways
of dealing with spam. Evan Harris wrote a paper on this in about 2003. It's based on a very simple observation. The spammers don't really care whether they successfully deliver. It's the shotgun approach, right? Just fire and forget. And as Gilles Chahat found
when he was writing OpenSMT PD, they don't care much about the RFC. They don't care much about the 451s. If they can't deliver here, they'll move on. OpenBSD first implemented it in 3.5, but it is available in some way or another on almost every platform for every MTA.
But there is the problem. The chief complaint usually is from users. Well, so-and-so from such-and-such company sent me an email four hours ago. Why haven't I received it? That's probably your number one complaint. I oftentimes consult for small companies, help them set up their email servers, and that's the number one complaint
when I would implement gray listing. They would complain, well, I sent an email. Why isn't that email automatically added? Well, sometimes that's a configuration issue. Sometimes it just doesn't work out. This, for me, this last one, was the one that really almost put the nail in the coffin for me on gray listing.
Large senders like Google don't have one outbound mail server. They don't have two. They have a thousand. They have entire sub-ranges that they send. So what would happen is you'd get an email from, or you'd get a connection from one IP address, you'd gray list it, and then they would call back later from a different IP address. So you'd add that one and gray list it.
It might take 24, 48, 72 hours, never, for it to work its way back around before it got expired out. Okay, so what's changed to make the various forms of X listing viable? In some ways, not much.
But SPF is a really interesting change, and I'll explain in a few moments why SPF helps us out. There was also an immense consolidation. A lot of companies aren't running their own mail servers anymore. They're using Google. They're using Outlook. They're using some managed provider.
Not all. A lot of SOHO folks still run their own. I run my own. A lot of my customers run their own. But for the most part, we've seen a lot of consolidation. But the other thing is, spammers are still often using these non-canonical senders, and this is a key thing,
they also do a lot of bad things from those same IP addresses, and we'll see why that's important. So from these couple of points, we can see that we can begin blocking more spam at the MTA. So let's talk about SPF. So it's the sender policy framework. You all may remember when this came out. It was in the mid-2000s. I think it was Yahoo that came up with this.
It's a very simple idea. I'm going to publish a text record that indicates what are the valid IP addresses, ranges, or hosts that I'll be sending mail from. This is an example record. But the records can be incredibly long, and they can include numerous IPs,
IP ranges, subnets, hosts, and you can also include other SPF records from other hosts, from other domains. So why does that matter? Well, 20-odd years ago, if you did a host check on a domain, you'd get back the canonical IPs for that,
and you'd get their MX record. There was usually a one-to-one correlation between MX records for, and that would tell you who they were going to send as, and where they wanted to receive email. But that's not the way it is anymore. As I mentioned before, we've seen this divergence. The large mailers, as far as I can tell, neither Google or Outlook or Yahoo or anybody,
use the same IP addresses for MX records that they use for sending. They almost always have that separated. So an SPF record becomes like an MX record from where a host domain is going to be sending its email.
So if I have this canonical record that tells me which IPs or hosts they're going to be sending email from, I can automate the building of my whitelist. So all that presupposes is that I have a list of good domains, and we'll talk about that in a few moments. So, of course,
you need a way to walk those records. And this is where a couple of projects come into play. One that I did by myself, and then one that Gilles Chehad and I worked on a little bit together. The first is SPF fetch. This started, actually it started before my BSD CAN tutorial in 2016. I was doing it on my own server.
I didn't invent this. I don't know who figured this out. It's kind of an obvious use of SPF. But you do need a way to recursively walk these records. Because a lot of large hosts like Google will have hundreds of these. So I put together a script as part of my first tutorial
called SPF fetch, and it does exactly what the name implies. It recursively looks up SPF records, converts them to IP addresses, and then spits them out. So now I can create a cron job, walk a known good list of domains, and I can spit those out on some sort of a frequency. The frequency is up to each person.
I tend to do it every 24 hours. You could do it once an hour. I wouldn't do it every 10 minutes. That's probably getting ridiculous. As part of that project, though, I started creating some other scripts. So I created a script called SPF update PF, which takes the output of that and then adds it to PF tables.
One of the nice things about PF tables is if you load a table entirely and give it a list of IP addresses, it will remove the ones that are missing, that are already there but are not in the new list, and it will add all of them. It's a very fast, very quick and effective way of updating your table.
Then the other thing I created was a script called SPF MTA capture that you can put in syslog. While you may or may not know, syslog doesn't have to just write to a file. It can also pipe through to a process. So I use this script so that every time the mailer sends an email,
it sees the log entries that go through, and if it sees a successful send, it grabs the domain for that, does an SPF walk on it, gets all of the IP addresses, and then adds them to another list, which is a little more ephemeral. I assume that when you send an email address, you want to get email back from that domain for a while.
How long you want to do that is up to you, 24, 72, 96 hours a month, whatever makes sense for you. Obviously, at some point, you might think to yourself, well, if we're hitting this domain a lot, I'm going to go ahead and add it to my permanent list.
This group of scripts is all in shell. It's just plain old shell. Again, it's part of the reason why if you're running Outlook.com, you really need to look at rewriting. Take the techniques, don't take the scripts. Some of them are... Yeah, nothing horrible. This is horrible.
Okay, so then, last year I updated the scripts, made them a little bit better, published them and pinged zeal about them. He's like, hey, wow, what a coincidence. Theo just asked me to add something like this to SMTP CTL because they're the exact same reason.
I need a list of good IP addresses that I can whitelist. SPF records are the right way to do that. So, Gilles and I collaborated on it. I'm going to be honest here, he did most of the coding. He'd already done a lot of the resolver stuff in OpenSMTPD. He just grabbed that code, packaged it up. I added a couple of small things to it,
and then he took all that code and imported it into SMTP CTL. I do maintain a standalone version for those folks who aren't using OpenSMTPD and aren't on OpenBSD where SMTP CTL is in base. At the end of the slides, there's a bunch of links that you can click through.
All this stuff is on my GitHub. It has basically the same features as SPF fetch. You can see, and it's really fast. So here's what an example usage of just using SPF walk would look like. You call it, give it one or more domain names on the command line, or you can pipe them in over standard in,
however you want to do it, and then it'll just iterate over those and it'll spit out all the addresses, however they're in the record. So if it gives you an IP address, you get an IP address. If you get a range like here with the slashes, you'll get those.
It does resolve host names. It always resolves host names, so it'll never spit out anything but IP addresses. Both SPF walk and SPF fetch standalones, except the minus four and the minus six flags so that you can only get one style or the other. I don't think SMTP CTL does that at the moment,
but it'd be trivial to filter the output if you didn't want IPV4. You can just filter out by colons, and if you didn't want IPV4, just filter on dots. Okay, so I've given you a lot of background information, so how do I take all of this
and turn it into a set of techniques for actually preventing spam? So the first part is you've got to do something with your firewall. Probably since most people here are BSD users, you have access to PF. Excuse me, I've got a cough.
I've lost my voice.
Okay, give me just a second. We need to configure our firewall. The key thing is whitelist should always win. If you're going to blacklist,
and we'll talk about blacklisting in a few moments, you want to make sure that if somehow or another one of your known good senders does get into your blacklist, they can still get through. You could take the hardcore view and say, if you're in the blacklist, you always lose, but I've found that's not very effective. When we talk about blacklists, I'll explain why.
You're going to want to expire your blacklisted IPs fairly regularly. I've found 24 hours is usually long enough. I think, Peter, that's what you do, is you do a rolling 24 hours.
Right, and they'll be constantly re-adding themselves, and we'll talk about how I discover who should be re-added. I was going to talk about this a little bit later on, but Peter uses a lot of the same techniques if you read his blog, you'll see a lot of similar ideas. So, whitelisting the known good mailers.
And so, who is a known good mailer? Well, for our purposes, again, thinking about what the definition of spam is, it's mailers who play by the rules, places that have proper unsubscribe links. But it is a little bit of a, what I call a timey-wimey, thank you. Timey-wimey, I know one when I see one kind of thing.
There are a few that kind of fall in that grey area where you're not entirely certain who's the bad guy, and who's not. But, Gmail, Microsoft, Fastmail, and even Yahoo, although I've sometimes found Yahoo in my blacklist,
and I'm curious how they got there, because I'll see things that shouldn't be happening from domains that are owned by Yahoo, but that could have just been a fluke, someone renting a server or something. So, in the SPF fetch project, I have a list of what I call the common domains.
And it's, I found this on another site, I give a link in the SPF fetch project that will tell you where I got those. I also add my own. So, for example, my bank doesn't happen to be in this giant list of known good mailers, but I trust my bank, so I know they're not spamming me. So I add a few ad hoc. But for the most part, I just use these lists,
but I also use BGP spam D. Now, if you're not familiar with this project, this is a project started by Peter Hessler, using BGP to distribute both white and black lists. And we'll talk a little bit more about the effective use of that. The other thing I do is I watch for outbound mail. I mentioned earlier the SPF fetch,
or SPF monitor MTA script. That's really effective for making sure domains get added. The one thing that I think I need to add to this is sometimes I'll see an IP address come in on a gray list. It'll properly go through and get whitelisted.
What I really need to do is then take that IP address, do a reverse ARP on it, find out which domains are associated with it, and then do an SPF walk on that domain and pick up all the IP addresses for that particular domain. That would be a very effective way, because again, a lot of mailers are using more than one IP address for outbound.
So once you've whitelisted one IP address, you really have to find out what are the others associated. The big problem with that, of course, is getting the list of all the domains that might use that as a mailer. But since we're only blocking out IP addresses, it should be good enough. Reverse DNS. Yeah, thank you.
Reverse DNS. Yeah, don't want, yeah, thank you. So who are the bad actors? Well, again, for our purposes, mailers who don't play by the rules. First off, are they blacklisted by trustworthy sources? And that's your definition of trustworthy. I trust BGP Spam-D. I trust the default entries in Spam-D on OpenBSD.
I trust Peter's List. So there are places that I go to, but you may have your own. But there's also a group, anybody who sends email to your spam trap addresses, if you've set up spam trap addresses. Another good place. But I also look for people who are doing things
they probably shouldn't be doing, and I'll show you some examples of that. So briefly, our trusted blacklists. It's a Nick Spam, BGP Spam-D. And there's so many of these, I'm not going to try and name them all. Google, you will find good blacklists. So I also do a bit of log file mining,
looking for bad actors. So the theory behind this is that if I've got a compromised machine and I'm sending spam from it, I've also got a machine that I can do all kinds of things. And so I start looking through the HTTPD and the SSH logs, looking for strange things from IP addresses.
So for HTTPD. So if you want to go really broad, look for all 403 and 404 errors. Again, if your white lists are good, so for example, my mail server doesn't serve any HTTP at all, but I leave the HTTP daemon running.
So I figure if you're connecting for almost any reason to that specific server, you're probably a narrative good. And if it's Google's crawler, it's probably not going to be one of the IP addresses that they send mail from, and Google's already in my whitelist, so I'm probably safe to just go ahead and blacklist anybody who connects to that HTTP daemon.
However, if you want to be a little more conservative, scan your logs. Look for people looking for things like PHPMyAdmin, MySQLAdmin. These are just a few that I found in one quick scan of my logs. So I actually, on my mail server,
I go with the broad scope. You connect to my mail server on HTTP, I don't care who you are, you're going in the blacklist. So it's basically a kill them all, let the whitelist sort it out later kind of solution to the problem. Because remember, if it was on my regular HTTP server,
then right, I would go for something more like this narrower scope. Right, but part of my theory is
nobody should be connecting to PHPMyAdmin on any of my servers, and probably none of your servers either. Right, but it's just an example. Even if you're running WordPerfect on your website, and you've got a WP admin, or WordPress, wow.
Really, really dating myself there. I've been calling WP WordPerfect for 25 years, and it isn't going to end now. Sorry, WordPress, y'all are... Even if you were running WordPress, thank you, On your server, I would probably tend to use
some sort of a pre-authentication that you'd have to do, you'd have to pre-authenticate to even get to that, or I would be doing IP filtering anyhow. So again, anybody who's looking for that URL is probably up to no good in my book. I'm especially looking for the I like this one.
I had never seen this one until recently. This is one of my favorite ones, because y'all have heard Linksys has been having some interesting issues. When I saw that one in my logs, I liked that one. Again, remember, if your firewall rules are set up correctly, you're going to be whitelisting all of the good senders anyhow.
I have a script that's an example of this scan log force stuff, and I add them to the brute force table. Okay, SSH-D. Should anybody on Earth be trying to connect to your SSH-D instance as root?
Period. End of story. Let me give you a hint. If the answer is yes, okay? So the broad scope would be any failed login attempt, but really better would be any user that's not in your allow users or allow groups directive in SSH-D config. That's probably a much more targeted view.
But if you really want to go narrow, I found just grabbing IPs for people trying to connect as root on my server eliminates a massive number of IP addresses every single day. I'll talk about this when I get to automation, but every day I get an email of all the IP addresses
that have been blocked for the coming 24 hours. The routers are all also VSD-style machines, and run scans like you're talking about. And if you're running OSPF,
if you black hole route people looking for SSH, you instantly spread that across your entire organization if you're running OSPF. So it's an easy, without actually even coordinating all of your log, if each machine publishes a black hole route for any mystery for 24 hours or 48,
whatever your time is, OSPF is a wonderful way to distribute that. And it's not even just mail-related. That means that miscreant is blocked across the organization from OSPF. And there is a huge debate in the community about whether you should black hole or let people connect to something like spammy and waste the spammer's time.
But my theory on that is look at the compromised machines. Am I really hurting them that much? Probably not. But it's up to you. Whatever you want to do. It's six of one. It's your server. You get to do what you want to do. That's always my philosophy. But that is a good point. And we'll talk about how to share lists. And OSPF is a great way to do that.
They have to have keys. Right. Failed route logins.
Yeah, that would still be it. Failed logins of any sort. Right. Yeah. And if you're running PF, you can also set up, after n failures, just send them straight into the blacklist. So there's ways of... If another has any issues,
it means they never get to scan the whole rest of that block. They're already blocked before they go very far. Okay, so obviously you're going to want to do a bit of testing with this. White lists are mostly safe. Black lists, this is where you can get into a little trouble. I always make sure that if I am running any kind of blacklisting
that I ensure that key IP addresses that I know I'm going to connect can always get through no matter what. So a little bit of safety there. Gray lists are mostly safe, but monitor your logs. Make sure that someone isn't getting into the gray list and never getting out. Or if they never do, ask yourself why.
Usually an IP address that appears in your gray list but never makes it to white is an indication of a spammer because they don't have a second time. But every once in a while I've had someone complain about a mailer that couldn't get through. And I'll go through in a look and it's still baffling to me sometimes why they can't get through,
but definitely monitor your logs. If you're using pf, then pfctl minus t table name minus t show is your friend for that. Just pipe it to an email, send it to yourself every day, or put it in SQL Server or MySQL, something like that. And then you'll want to automate,
obviously, and remember that was one of my key goals for this project, was to ensure that it can be somewhat automated. So all of my scripts, in one way or another, are on a cron job. Some of them I put in daily, .local and etc. Some of them are on cron jobs, but it just depends on when I want them to run.
So I've got some that run every 15 minutes for some of the log scanning because I want to catch things as quickly as possible. But most of my scripts work on a 24-hour rolling basis. Now you might be wondering, well, how do I ensure that an IP address I see at this time gets expired out? After every IP address,
I put a comment and then I put the timestamp, as a Unix timestamp in there, and then all I've got to do is filter out all of those that are older than 24 hours. That's how I keep my rolling. One of the nice things about PF is when you load a table, it ignores comments. So you just load them in and boom, you're done.
That's right. Actually, I think I do. Not to say that. I'm using the timestamp somewhere. Maybe another script. But you're right because I do use the minus T expire.
Yeah.
So the gentleman mentioned having temporary whitelists that you can add folks to. And so you need a way to share these lists across your servers. OSPF, as the gentleman here mentioned, BGP Spam D is a great example
of how you could use BGP to share black and white lists across your various servers. If you're running a really small SOHO mail server, you probably don't have anybody to share it with. But you can use it also since I have small clients that I sometimes work with.
I can share between them, and so I can create a much bigger list because I've got access to more servers. The more servers you have access to, then the more information you can gather. And the other nice thing you can do, you could pull from BGP Spam D once and then distribute amongst all of your hosts so that you don't have to hammer Peter's server.
I kind of jokingly point out if you didn't go to Peter's tutorial a couple days ago, BGP is not hard to learn for this purpose. If you wanted to use BGP for what it was designed for, you really need to spend some time studying it. But to use it as a mechanism for distributing whitelists and blacklists, it doesn't take a whole lot
of study to learn how to do it. And I've stumbled across tutorials on the internet where people are doing exactly this. Okay, so one of the objections when I tell people about what I'm doing here is, well, isn't there a risk of blocking legitimate senders? You've been paying attention.
I keep hammering on make sure your whitelists always win. And that's really your failsafe. So I really don't care what goes up in my blacklist, because one, my mail server is orthogonal to everything else that's going on. I don't run HTTP on the same server,
or at least to serve things. Like I said, I do run it as a way of collecting bad actors, but I don't use it for anything. So I keep my server completely orthogonal. And so for the most part, you're pretty safe there. And then some obvious questions
since I've been talking a lot about OpenBSD. You notice there wasn't a whole lot of OpenBSD-centric ideas here. You don't have to run OpenSMTPD to do this. You can use Postfix. You can use mail if that turns you fancy. You do need something like SpamD. As far as I know, SpamD has not been
ported elsewhere, but there are SpamD-like plug-ins for many of the MTAs, although unfortunately, they don't tie in. They're a filter in the MTA, so the MTA still has to accept the connection and then shunt it off. There is one exception we'll talk about in just a few moments. Postfix has a really interesting tool that
works in front before Postfix. Our SpamD has a gray-listing module. Again, you don't have to run OpenSMTPD to use SMTP-CTL, but if you don't have it already installed, you can get the SPF-WALK standalone version that I maintain,
or you can get SPF-FETCH. Those are the only three tools that I know that actually do SPF recursive walking, but none of them are OpenBSD-centric. Some other interesting options. Postscreen is a really interesting tool on Postfix. It's kind of a Swiss knife
version of everything we've been talking about today. It has more heuristics. I'm not going to try to describe it. I thought there wasn't enough documentation for me to get a real good feel for it, but it does work in four layers. The first layer is to try and catch your gray-list.
The fourth layer is things like OMIVSD or RSPAMD. The cool thing is it accepts connections on port 25 and then hands them off to Postfix, which means it doesn't work with your favorite mailer because it's handing off a file descriptor. If your mailer doesn't know
how to receive a file descriptor to then receive the connection, you're done. As far as I know, Postscreen will only work with Postfix. Someone goes in and does some rewriting. And then, of course, we all know about the PostMTA frameworks like OMIVSD and RSPAMD. They're both really fantastic.
The chief difference between them is RSPAMD is written in C, whereas OMIVSD and a lot of the components are Perl. It's highly performant. A lot of people are using OMIVSD. Actually, it's OMIVSD new in case you're not familiar with that. It's good enough. The one thing I would suggest is if your environment
is not purely Linux, perhaps Mac OS or one of the BSDs, you probably still need to run one PostMTA scanner, and that would be something like ClamAV. The other thing I would suggest is if you're not doing DNS filtering for
known bad actors, OpenDNS is a great tool for that. Fairly inexpensive. They have a free tier for home and family and small office. Basically, what it will do is if it sees a URL that is a known fishing site or something like that,
it will redirect them to a landing page that you can configure and tell the user, hey, go see IT about this. You might be wondering, will DKIM and DMARC help me out at all on filtering spam? The answer is, broadly, kind of no. The reason why is they're
not really designed for that. SPF, as we talked about, is a negative to MX records for who is sending email on behalf of the domain. DKIM is just signing the outbound mails. You publish your public key in DNS and then other senders can look that up,
check the signature and say, okay, not only did it come from an authorized IP by this domain, but it came from the authorized MTA for that domain. That's really critical for spoofed email. But guess what? If you're running spambots.com, you can publish a DKIM record.
I've seen this in my spam before. I've gone through and looked at it. Wow, nice DKIM signature there. It's still spam, so it doesn't help you much. Plus the point of that is you're right. It's fantastic for helping suss out. Actually, with DMARC, the nice thing about DMARC is it takes both SPF
and allows you to publish records for what to do with failed email. Yeah, that's the key thing. I think you're badmouthing DMARC. No, I'm not badmouthing DMARC. What I'm saying is...
Right. But if I'm running spambots.com, and I'm... Right. This is true.
But my point is for pre-MTA filtering, DMARC doesn't help you much. For pre-MTA. This is correct.
What it lets you do is build better whitelists and in an automated way. A lot of spam. That's right. Now, that I agree with 100%.
DMARC is very effective for that. So, yes. But my point is, I'm most interested in blocking spam before the MTA. I think DMARC will help a lot. But for this purpose, for this specific goal,
DMARC isn't going to help you much. But what it does do is it helps you with your own domain. It helps you immensely. One of the things I'd really hope that DMARC would help out with was that companies like Google or Outlook, whatnot, would whitelist other domains more quickly. To this day,
outlook.com sends emails from my domain to most people's junk mail. I'll give you all a tip. If you ever change IP addresses, you've got a good IP address that's got great reputation, do overlap. Bring up the new IP address first, put it in your SPF records, work for a month or two,
because a lot of domains will transfer reputation from one IP address to another IP address if they see it associated with a good IP. I made the mistake when I recently switched providers because I immediately cut off the old IP address thinking that my domain had the reputation.
Huge mistake there. It's the IP address. It should be the domain. I've had the domain for 20 plus years, but, again, I'm not bad-mouthing DMARC. I just don't think for these purposes it's the precise answer. It's an answer.
We should push for it, but it's not going to solve the problem of just basic unsolicited spam we're sending from hither and yon.
Let's look back on this in 10 years and see what DMARC has actually done for us. But I do agree rollout is a huge part. Well, the only way we're going to push for it
is if domains start enforcing DMARC and requiring DMARC from all senders. Right now,
in five years, DMARC might be broad spread enough that we could start blocking domains that don't have DMARC. But right now, DMARC is a question mark as to whether it's going to actually put a dent in spam. It has potential. Okay, so is it effective? I'm just going to use the example of my domain.
It was registered in 95, so I've had it for 23 years. I've used the same primary email address for those 23 years. I'm almost certain it's on every spam list known to spammer kind because they share these things. I publish my email address everywhere. You can find it in Google Groups. You can find it
in some of the open BSD lists. You can find it in commit messages. You can find it all over the internet. I am well known. And as I noted at the beginning of the talk, it's in all of my slides. And my email address is the domain catchall. So even if I'm not in their spam list as AKP,
I am also as Aaron, Joe, Bob, AAA, A123, admin, root. I have all email addresses that aren't already allocated. Right now, I typically receive zero to three spams per week in this domain. Occasionally, I'll see a peak of five in a week. By and large,
I don't see a whole lot of spam. I've got some other domains that I'm implementing this on and rolling it out. What I'm hoping, though, is there's a node there at the bottom. The plural of anecdote is not data. So do I have analytics? Not yet. And the reason why is this was built...
No. No. It's all pre-MTA. I used to run ClamAV, but somehow or another misconfigured it out. And since I run mostly Macs and BSD in the network,
I'm not worried about viruses. That's right. Now, there is one other thing I do. I debated whether to put the slide up, and I didn't, so I'll just tell you all one of the other things I do. On my account
and my account only, when I put something into junk mail and it's marked red, I have another script that goes through, mines the IP addresses. The assumption is if I looked at it in junk, I left it as red and didn't move it out, then it's crap, and I want it to go in my blacklist.
So that's one more place where I mine IP addresses is from my own junk. I wouldn't roll this out broadly because it requires the user to understand it has to be in the junk mail, and if you leave a good email in there as red... Now, of course, your whitelist would probably, again, save your butt,
but, you know... That's right. At some point in the universe,
AC gets hacked for whatever reason, because it often becomes a spam source. The second thing... So you're talking about egress from the network. Yes. That forces them,
if they're going to send mail through your mail server, that's what they will say. So the second thing is put a script on your mail server to... And you can put a high number, like 1,000 recipients per IP address, and that second block is now you're forcing them to go through your mail server. You've done a central choke, and if it's a spammer,
they're going to trade to do way more than 1,000, but it's a good number to start, but you won't catch many normal users sending to 1,000 recipients in 24 hours, and that means, like, you're counting the recipients, not the number of emails, because spammers will send, you know, 1,000 recipients in one actual email.
So just for those who are playing the home game, as the gentleman suggested, make sure within your network to be a good neighbor, block outbound 25, force everybody to go through your own MTA, and then put maximum outbound recipient values like 1,000 users. The other thing some people do is they also do outbound scanning.
A lot of ISPs by default do that. A lot of ISPs do that on the service
that has the last 25 out of 1. Right. Okay. So analytics, like I said, this was built through accretion, and so I didn't start out
with the intention of, well, how do I build a pre-MTA filtering system? I built it up through accretion, and so at first, I didn't worry about analytics. My analytics were, oh, gosh, my inbox doesn't have that much spam. That's good stuff. But I want to be able to show that. So I'm working on some ideas
about how to count denied connections versus actual connections, count the amount of spam that came in that was actually junk versus what wasn't. So this may require a little bit of coding, but I've got some ideas about how I may be able to accomplish this. Those may be by EuroBSD
or Asia or BSDCan next year. I'll actually have analytics, and I can demonstrate how well these techniques work and give you real numbers. And part of what motivated me to do this is one of my friends whose domain I work on often has a barracuda, and his barracuda does give that kind of data. I was like, oh, that's kind of useful.
I'd like to have numbers like that. So it inspired me. Okay. So where can you find the code? So SPF-Fetch is the project where you can write a script. There are a few scripts that aren't in there yet that I need to push, so just watch the part. There's a link at the end of the slides, and on the contact me slide,
it's just github.com slash a-k-p-o-f-f. I'm a-k-p-o-f everywhere. I want to be known, and if I don't want to be known, then I'm somebody else. A few odd credits. Egeel from the OpenSMTPD project, partly for writing OpenSMTPD,
which has made my life a lot easier as a mail admin, but also chatting and converting SPF-Fetch to a NIC project for SPF-WALK. Peter, whose blog I... Actually, I think I follow his tweets more often. Then I go to the blog, but Peter rants frequently about spammers.
They're good rants. This isn't bad ranting. This is really effective good stuff. Obviously, the OpenBSD developers really PF is a lot of the core of what makes all this work together. Any good firewall should allow you to do a lot of this,
but since I work on... Yeah, yeah. And he's like, just stop there. It's all PF. But really seriously, PF does help me out immensely. The folks who took my tutorial the past two years asked me a lot of questions, pushed me hard on ways to do things. How do I solve such and such a problem?
And then, of course, the folks who show up talk to me before and after these things, coming to BSDCan, sharing thoughts and ideas on how we can fight spam. Okay, so... Any other questions? I might have answers.
Nothing... Right. Right. So for the home viewers, what Henning said is this is all fantastic,
but in the case of stolen credentials, and we've all... We all receive these. Someone's Yahoo account gets hacked, and their email... Everybody in their address book all of a sudden starts getting spam from them. Oh, I've seen it. Yeah, right.
Well, the script that most people don't do...
Right. The script that this IP is 4,000 and this IP is 10,000, and if you're not in that list, then you're 1,000. So that's the solution to that, is that, yes, there are some users that have legitimately said...
Well, but it doesn't even have to be 1,000. One of my clients turns out his secretary was all of a sudden spamming all of his customers, but his customer base is under 1,000, and so limits like that wouldn't have caught it. So in that case, outbound ClamAV scanning was the solution,
was to scan all the outbound emails. Do you remember when they're in the... They actually have that. Yeah. In my case, it won't be an ISP.
Yeah. Yes, sir. We have a DSL provider, the DSL, so we don't care about it,
to answer the question before. I can block everything because the DSL, that's somebody else's provider, so I just block everything and turn it on. But from the hosting side of it, rather than block, looking for like 1,000 a day, I look for like 100 an hour or something like that,
and I stop. Yeah, great. And I stop and report. I get reports in my email says somebody's sending a lot, but if they just wait, they can send some more. So that'll stop them and immediately fly. I get the reports, and I say, oh, somebody's spamming. Caught my old sister that way. And I can review them,
change their password, call them and say, hey, here's your new password. Don't give it a kill. Don't use password one anymore. So real quick, there's nobody in here after 2.30, not until 3.30, so we can keep talking, but I just wanted to throw up some quick contact details and then mention the links.
It is just a few odd links. I'll put the slides up onto the BSDCan website. I also, on my GitHub, have a repo called Talks. You can find all my old talks. You can find my SMTPD tutorial. I've heard from people that it is possible to follow the tutorial without my personal presence
and build a mail server, so give that a roll if you want to build a server. I would recommend, I've known a lot of people who've quit running their own mail servers because they, one, thought it was too hard, or two, didn't want to deal with the spam or whatever. It's really not that bad if you're willing to put a little bit of time into it. This isn't a set-and-forget kind of thing.
If you're running a static blog and you put it in HTTPD, all you've got to really do is run your updates from time to time, but a mail server does require a little more care and feeding, but it is possible to run a SOHO mail server and derive real value from it. Right. Is it 2.45?
Okay, great. Feel free to grab me afterwards. I'm always happy to talk about these topics. If you find issues with the scripts, you think I could have done something better, you've got diffs, send all that in, but I probably won't get the repo updated until I get back to Houston because I've got a 6 a.m. flight,
so give it a day or two if it doesn't seem like I'm moving very fast on getting that published. Thank you all for coming. Appreciate it.