Using sed to match a series of email addresses, and exclude certain domains (Office 365 Exchange online migration)
Posted by rbTech Staff, Last modified by rbTech Staff on 22 July 2016 04:49 PM

We're doing an O365 migration, and this particular site has a large list of domains that they have configured in their on-prem exchange but don't want to migrate.

I have the list of domains we don't want, and a list of domains we *do* want to move, and a file with a dump from Exchange with the users First Name, primary SMTP address, and additional email addresses configured in AD.

To get a file with a list of each user, their primary SMTP address and all other SMTP endpoints, run the following command in Exchange PowerShell

Get-Mailbox -ResultSize Unlimited |Select-Object DisplayName,ServerName,PrimarySmtpAddress, @{Name=“EmailAddresses”;Expression={$_.EmailAddresses |Where-Object {$_.PrefixString -ceq “smtp”} | ForEach-Object {$_.SmtpAddress}}} | Export-Csv C:\TEMP\primarySMTPendpoints.csv

The file formats are as follows:

DisplayName, PrimarySmtpAddress,EmailAddress
Fred Smith,[email protected],[email protected] [email protected] [email protected] [email protected]
Lana James,[email protected],[email protected] [email protected] [email protected]
Mary Trumbull,[email protected],[email protected] [email protected]

Domains to keep: domain.tld, secondarydomain.tld

Domains to skip: unuseddomain.tld, someotherdomain.tld

I wanted to write a slick Awk and sed script to parse through and delete any emailaddress from the domains to skip.  I ended up with the trainwreck below:

cat primarySMTPendpoints.csv | sed -s 's/\b\([A-Za-z0-9._%+-]\)\[email protected]//g' | sed -s 's/\b\([A-Za-z0-9._%+-]\)\[email protected]//g' | sed -s 's/ \+/ /g' > yum.txt

Update:  I did some more reading and remembered that sed is 'greedy' with it's use of Regex's.  So a mild improvement (it's *significantly* faster but not any more readable) on the mess above is as follows:

cat primarySMTPendpoints.csv | sed -s  's/\b\([A-Za-z0-9._%+-]\)\[email protected]//g; s/\b\([A-Za-z0-9._%+-]\)\[email protected]//g; s/ \+/ /g' > yum.txt  

UPdate2: Just because I'm more than a little OCD, I kept hacking on this later because the messiness of it all was a full assault on my sense of order in the universe.  It's better:

cat primarySMTPendpoints.csv | sed -s 's/\b\([A-Za-z0-9._%+-]\)\[email protected]\|someotherdomain.tld//g; s/ \+/ /g' > yum.txt

The above is significantly more readable, and I'm satisfied at this point since this is what I was originally after.  Whew.

What I was hoping for was a regex that would match the domain list as well as any possible email address to the left of the @, but was thoroughly unsuccessful at that.  I'd be delighted if someone came up with a more elegant way to do this than run 15 regexes in sequence (which is what I ended up doing because I was out of time for the task and needed to get it done).



(0 vote(s))
Not helpful

Comments (0)
Post a new comment
Full Name:
CAPTCHA Verification 
Please enter the text you see in the image into the textbox below (we use this to prevent automated submissions).