Welcome to the latest edition of the Modern Service Management for Office 365 blog series! In this article, we review monitoring tools and techniques to manage information security using audit data. These insights and best practices are brought to you by Carroll Moon, Senior Architect for Modern Service Management.
Monitoring: Audit and Bad-Guy-Detection
In the Monitoring and Major Incident Management post, we discussed how monitoring can mean many things. In that post, we focused on monitoring for availability and performance. In this post, we will focus on audit and bad-guy-detection.
The most important part of this discussion is the ManagementActivityAPI. That API was announced in April of 2015 here. The MSDN reference is here and the schema is here. For many IT Pros, using the API is something they do not have the current ability (of course, they could learn) or time to focus on. My goal here is to help simplify the discussion so everyone can start to get business benefit from the API.
What should you be looking for?
Before we get into the “how”, we should discuss the “why” and the “what”. The question that most readers will be asking is “why should I care”? The answer to that question is “it comes down to scenarios”. Would it help you with audit and compliance requirements to be able to provide a report for “which admins gave themselves permissions to another user’s mailbox” last month? Would it help you with security monitoring to alert on the condition of “if an account has X failed login attempts in Y minutes”? Would it be helpful to your Service Desk’s goal of becoming more proactive to have a report of all users who got “access denied” for SharePoint files so the Service Desk can go proactively train them?
There are countless scenarios that can be enabled by this data. I encourage you to spend time talking to your compliance, security, audit and Service Desk teams to brainstorm how this data can help them be more successful in reaching their goals. Once you land on a couple of scenarios (the “why” and “what”), then it will be more fun to talk about the “how”.
We have the API, now what?
I have included the links to the API’s reference and schema above, but what does that mean? To use the API, one needs to think of the data flow in steps:
- We need to get the data from the API. We would create a job that runs every N minutes using an appropriately permissioned service account to pull the data.
- We need to store the data. Usually, this would imply that we have a big-data-type monitoring or reporting solution in place that we will use for this data. If we do not have something already, we likely will want to go through a planning exercise.
- Once we have the data in a usable format, we need to quantify the queries against the data for the scenarios that matter to us. Remember, we discussed the scenarios above. For example, “I want to know if an account has X failed logins in Y minutes”.
- And once we’ve quantified our scenarios and related queries, we need to expose the data. We will likely want reports and alerts. We will want to tie the alerts to our existing single pane of glass, and we will want to define the subsequent worflows. For those alerts, we are effectively just adding “Major Incident” Scenarios to our list discussed already in this blog post.
Surely, there is an easier way…
There are starter solutions for what I’ve described above. Many 3rd parties have built solutions that pull from the API and start to quantify patterns, reports, and alerts that most enterprise customers will care about. See this announcement for a list of some of the 3rd parties who have done work in this area.
But what about a first-party solution from Microsoft?
There is a feature in Azure called Operations Management Suite (OMS). Within OMS, there is a feature called Log Analytics. Note that as of this writing, the introductory level of Log Analytics is free as described here.
On top of OMS Log Analytics, Microsoft has published a public preview of a solution to do steps 1-4 for our enterprise customers. You can learn more about that solution pack here.
Side Note: I encourage everyone to think of OMS Log Analytics not as a “new tool”, but as a feature in Azure that can be leveraged to enable scenarios. Just like Azure Machine Learning allows you to take advantage of ML capabilities without building your own, proprietary ML platform, OMS Log Analytics allows your business to take advantage of “big-data-monitoring” with minimal overhead. You just turn the feature on, and you pay only as you choose to store more information. It is a great new paradigm.
Wrapping it up
- The ManagementActivityAPI provides data that allows customers to be very creative with scenarios for monitoring and reporting
- No matter what toolset you use (or even if you do not have any tools), you can download, store, query, report, and alert on the data from the ManagementActivityAPI to enable your scenarios
- Many 3rd parties have added solutions on top of their existing products to help everyone get started with these scenarios
- Microsoft has a first-party solution (in preview) in Azure’s OMS Log Analytics feature that will help you get started very quickly
- I’d love to hear about your business’ scenarios. Let me know how you are using the ManagementActivityAPI to help your business via twitter at @carrollm_itsm