Chapter 7. Monitor and optimize a SharePoint environment

Up to this point, we’ve been talking about what steps are required to design a viable SharePoint 2016 environment. Designing the topology, planning security, installing, and configuring the environment lead to the eventual completion of the implementation phase, but this isn’t the end; in fact, it’s just the beginning.

Moving forward, the environment will need to be evaluated prior to its release to production, to ensure that services are running as desired and that the scaling of the farm is as required to meet service level agreement (SLA) and capacity expectations. Once the farm has been released, it will continue to require careful maintenance, planning, and management to ensure service to a growing user base.

In this chapter, we focus on three major tasks: monitoring, tuning and optimization, and troubleshooting. Monitoring is concerned with the use of tools and metrics to minimize system failures or loss of performance. Tuning and optimization expand on these concepts, providing insight into how to optimize the performance of not only SharePoint, but SQL and Internet Information Services (IIS) as well. Troubleshooting is perhaps the most complex of these tasks and is concerned with establishing performance baselines as well as using both client and server tools to troubleshoot issues as they occur.

Skill: Monitor a SharePoint environment

The first few steps after implementation are often the most critical. The organization sets about getting to use this new environment, causing a sudden uptick in user adoption. As users begin to evaluate the farm, the extra resource load on the farm places stress on it and could expose any design inconsistencies not discovered during performance testing. Previously defined SLAs with the business might also be in effect, beginning to restrict the times that the system can be down for maintenance.


Important

Phasing in the adoption of a new SharePoint environment can be a critical component to mitigate the influx of new users.


Ensuring reliability and performance levels during this period is a key requirement for user adoption of the new platform. Effective administration and monitoring of the SharePoint environment can capture events, addressing any misconfigurations or design shortfalls before they affect user adoption rates.

Define monitoring requirements

Monitoring is the art of using instrumentation to analyze and predict the behaviors of a given system. Knowing what instrumentation items are available for use and the expected values for each enables the administrator of a system to be able to adjust for performance idiosyncrasies of a system without experiencing any downtime.

Each new revision of SharePoint has introduced greater potential for monitoring at a more granular level than in previous versions. SharePoint 2016 continues this pattern by providing insight into the operations of major subsystems such as Microsoft SQL, ASP.NET, IIS, and other services.

Service guarantee metrics

In the planning stages of your farm design, you should develop SLAs that define the service guarantee of functionality provided by the farm to your user base. This guarantee defines not only the times that a system can be up or down entirely, but also the availability and enforceability of scheduled outage windows.

Within the SLA for your environment, you will define terms such as downtime, scheduled downtime, and uptime percentage. These metrics merely describe at a high level what the goals of monitoring are.


Important

Reviewing a production SLA will give you a good idea of what to incorporate into your own organizational SLAs. The Microsoft Office 365 SLA is a good example of what’s included in this documentation; a version of this SLA that applies to most regions and is written in English can be found on the Microsoft Volume Licensing site at http://www.microsoftvolumelicensing.com/Downloader.aspx?DocumentId=10758.


As an example, Microsoft produces an SLA for each of the component features within the Office 365 platform. For instance, within the Microsoft Office 365 SLA agreement for SharePoint Online, you will find definitions for the following:

Image Downtime Each of the services in Office 365 has its own definition of downtime. The definition for SharePoint Online is “any period of time when users are unable to access SharePoint sites for which they have appropriate permissions.”

Image Scheduled Downtime This is defined as “periods of Downtime related to network, hardware, or Service maintenance or upgrades.” The SLA goes on to define the advance notification period as “five (5) days prior to the commencement of such Downtime.”

Image Monthly Uptime Percentage Although this is defined for each service in Office 365, each of the definitions follows roughly the same formula. For SharePoint Online, downtime is measured in user-minutes, and for each month is the sum of the length (in minutes) of each incident that occurs during that month, multiplied by the number of users affected by that incident, as shown in Figure 7-1.

Image

FIGURE 7-1 Monthly uptime percentage formula

Immediately after these definitions, the SLA goes on to define what service credit is offered in the event of monthly uptime percentage falling below 99.9 percent (“three nines”), 99 percent (“two nines”), and 95 percent (“one nine”).

Monitoring levels

Now that you know what your monthly uptime percentage is (for example, three nines would give you a maximum of approximately 24*60*.001, or 1.44 minutes per day of downtime) and what constitutes downtime, you can begin to monitor the SharePoint farm (or farms) to prevent these incidents.

A single SharePoint farm has three major levels at which it can be monitored (from largest to smallest):

Image Server level At this level, you will be monitoring each tier of the servers that constitute a farm, including Front-end, Application, Distributed cache, Search, Custom role, and SQL database servers.

Image Service application level At this level, you will be monitoring all the services provided within the farm, such as Managed Metadata, User Profile, Search, and others.

Image Site and site collection level At this level, you monitor all the sites and site collections contained within the farm.


Important

It’s important to remember that outages do not necessarily require the failure of an entire farm; an improperly deployed feature or a misconfigured service application can result in downtime for a considerable segment of the user base without rendering the entire farm inoperable.


Monitoring tools

There are four on-premises tools that can be used to monitor SharePoint 2016 farms: Central Administration, Windows PowerShell, system- and SharePoint-specific logs, and System Center 2012 R2 Operations Manager.

Central Administration

Central Administration allows for the configuration and monitoring of the SharePoint logs as well as configuration of usage and health providers. Additionally, Health Analyzer runs a series of rules on a regular basis that check on the status of metrics such as these:

Image Free disk space on both SharePoint and SQL servers

Image Service issues such as problems with State service, InfoPath Forms Services, and Visio Graphics Service

Image SQL-specific issues, such as overly large content databases, databases in need of upgrade, and the read and write status for a given database


Need More Review?

A complete listing of all SharePoint Health Analyzer rules can be found on TechNet at https://technet.microsoft.com/library/ff686816(v=office.16).aspx.


Windows PowerShell

Windows PowerShell focuses on the diagnostic capabilities found in the Unified Logging Service (ULS) logs. The ULS logs can be quite detailed in scope, meaning that quite literally hundreds and thousands of entries can be found on a given server. When you choose to use the Get-SPLogEvent cmdlet, you can view trace events by level, area, category, event ID, process, or message text.


Image Exam Tip

Using the -MinimumLevel switch with Get-SPLogEvent enables you to look for events that are equal to or more severe than the level you specify. There are only two valid values: Error or Warning.


Additionally, you can pipe its output to the Out-GridView cmdlet to produce tabular log output in a graphical format (as shown in Figure 7-2), which can be easily refined or exported to an Excel spreadsheet for further analysis.

Image

FIGURE 7-2 Using the Out-GridView cmdlet


Need More Review?

For a more detailed discussion of the use of Windows PowerShell for viewing SharePoint diagnostic logs, see the TechNet article “View diagnostic logs in SharePoint 2013” at https://technet.microsoft.com/library/ff463595.aspx.


System and SharePoint Logs

Logs for monitoring and diagnosing SharePoint come from two distinct sources. At the operating system level, you find the standard event logs, in which events that concern SharePoint and its supporting technologies (SQL, IIS, and so on) are recorded (primarily in the Application and System logs). As previously mentioned, SharePoint also records information in its own series of trace logs, otherwise known as the ULS logs.

System Center 2012 R2 Operations Manager

If you have a larger SharePoint farm (or multiple farms), you might find that the monitoring of each individual system becomes too time consuming, and that even the use of the usage and health providers is not enough to provide a complete picture of the systems required to support SharePoint.

For this purpose, Microsoft produces a product known as System Center 2012 Operations Manager R2. Using this tool set with the System Center Management Pack for SharePoint 2016 not only allows for the effective monitoring of multiple SharePoint farms and their component systems, but also for the alerting and preventative actions required to assist in the maintenance of service level guarantees.


Need More Review?

System Center 2016 will be released soon (as of this writing, it is at Technical Preview 5). Until then, the current version is System Center 2012 R2, and the management pack used by this version for SharePoint 2016 can be found on the Microsoft Download Center at https://www.microsoft.com/download/details.aspx?id=52043.


The System Center Monitoring Pack monitors Microsoft SharePoint Server 2016 by collecting SharePoint component-specific performance counters in one central location and raising alerts for operator intervention as necessary. This tool allows you to proactively manage SharePoint servers and identify issues before they become critical by detecting issues, sending alerts, and automatically correlating critical events.

Configure performance counter capture

As your SharePoint installation grows in scope, you might want to evaluate the performance of particular servers within the farm. Examining the performance level of each server in the farm from an operating system perspective is one way to predict areas in which more system resources or configuration changes are required.

One tool that can be used for this purpose is Performance Monitor (PerfMon), a tool that is natively installed along with Windows Server. This tool enables you to monitor and capture metrics about your individual servers using a series of performance counters.

As SharePoint, SQL, and other applications are added to a server, additional performance counters for those applications are made available to PerfMon. These new counters describe performance and health metrics that are specific to each application (or its major components).

Starting a performance monitoring capture

To start a new performance capture, begin by opening the PerfMon App. In Windows Server 2012 R2, you can do this by going to your Start screen and selecting the tile for PerfMon (see Figure 7-3).

Image

FIGURE 7-3 Performance Monitor (tile)


Important

If you are having difficulty locating PerfMon, there is another way to start it. Simply start a search and then type PerfMon into your Search box and select its icon.


Selecting the Performance Monitor menu item causes the capture graph to appear, as shown in Figure 7-4.

Image

FIGURE 7-4 Performance Monitor (default view), capturing the % Processor Time

Not too impressive, is it? As PerfMon is intended to be customized, there isn’t much to see when it first appears. By default, all PerfMon captures is the % Processor Time performance counter over roughly a 100-second interval.

Adding SharePoint counters to Performance Monitor

All sorts of SharePoint-specific counters can be added to a PerfMon capture. Counters are included for SharePoint, but other represented subsystems counters include the following:

Image Access Services (2010 and 2013 versions)

Image InfoPath Forms Services 16

Image Office Online

Image PerformancePoint Services

Image Project Server

Image Search (including Graph, Gatherer, and so on)

Image Visio Services

Adding a counter to an existing performance capture is fairly straightforward, requiring only that you select the plus (+) icon in the toolbar and then choose a counter (for example, Current Page Requests, All Instances, as shown in Figure 7-5).

Image

FIGURE 7-5 Adding a counter


Important

One of the most valuable pieces of information in the Add Counters dialog box is the often overlooked Show Description check box. Selecting this box shows a description of what the counter actually does within the system.


Building and reporting performance using a data collection set

In the previous example, we added counters one by one. Performing such an ad hoc capture has limited usefulness, and generally means two things:

Image Starting another new PerfMon session would require that the counters be added back in.

Image Although a point-in-time view is useful, there is no meaningful way to capture and replay the counters as they appear.

The next logical step would then be to build a data collector set to monitor the performance counters, and then use the Reports feature of PerfMon to replay the captured metrics. Creating a data collector set isn’t terribly difficult, and becomes a useful tool in capturing baseline server and farm behavior for later comparison.

Within Performance Monitor, select Data Collector Sets, right-click User Defined, and then select New, Data Collector Set, as shown in Figure 7-6.

Image

FIGURE 7-6 Creating a new data collector set

Enter a name for your new data collector set, select Create Manually (Advanced), and then click Next (see Figure 7-7).

Image

FIGURE 7-7 Creating a data collector set (not using a template)


Important

There are no OOB data collector sets for SharePoint 2016.


Selecting Create Data Logs Using Only Performance Counters prompts you to choose which counters you’d like to log by clicking Add, which results in the options listed in Figure 7-8.

Image

FIGURE 7-8 Selected counters

The five counters shown in Figure 7-8 are those that are often used to check the basic health of the server on which SharePoint is installed:

Image % Processor Time (Processor Information section, All Instances) Shows processor usage over time.

Image Avg. Disk Queue Length (Logical Disk section, All Instances) Shows the average number of both read and write requests that were queued for the selected disks during the sample interval.

Image Available MBytes (Memory section) Shows how much physical memory is available for allocation.

Image % Usage and % Usage Peak (Paging File section, all instances) Shows the current and peak values for paging file used.


Need More Review?

The counters listed previously are those commonly used to troubleshoot core performance levels on a SharePoint server. Optimal values for the major performance counters can be found in the TechNet article “Monitoring and maintaining SharePoint Server 2013” at https://technet.microsoft.com/library/ff758658.aspx.


In the Create New Data Collector Set page, the selected counters are displayed, along with a sampling interval. You might wish to change this interval (time between counter samples in seconds). Click Next to continue (see Figure 7-9).

Image

FIGURE 7-9 Logged performance counters in the new data collector set

The data captured in this data collector set is stored by default in the %systemdrive%PerflogAdmin<set name> directory, although you can specify a particular log location (Figure 7-10).

Image

FIGURE 7-10 The default root directory

The final step in creating the new data collector set is to choose which account it runs under and then choose from a series of actions:

Image Open Properties For This Data Collector Set Enables you to specify additional selections for your log, such as its duration.

Image Start This Data Collector Set Now Saves and immediately starts the data collector set capture.

Image Save And Close Saves and then closes this configuration process (selected in Figure 7-11).

Image

FIGURE 7-11 Completing the data collector set

Now that the data collector set has been created, it can be used repeatedly. To start the capture, simply right-click the data collector set and select Start (see Figure 7-12).

Image

FIGURE 7-12 Starting a performance counter capture from a user-defined data collector set

Looking in the Reports section, under User Defined, you can see that the SharePoint Server Performance capture is running (Figure 7-13).

Image

FIGURE 7-13 SharePoint Server Performance capture, collecting data

After about five minutes, you should have a fairly good capture, but you could extend this duration if you want. After you capture enough data, select Stop, then select a report to display the metrics for a given time period along with some maximum, minimum, and average values (see Figure 7-14).

Image

FIGURE 7-14 Metrics captured by the SharePoint Server Performance report (data collector set)

Now that you have created a data collector set, you have the option of saving this as a template. Simply selecting the user-defined data collector set and right-clicking it gives you the option to save this set as a template for later use (Figure 7-15).

Image

FIGURE 7-15 Saving the data collector set as a template

Configure page performance monitoring

Page performance is dependent on a number of variables: whether the user has caching enabled on the desktop, whether the IIS web servers are caching artifacts such as graphics and text content, how SharePoint is caching information, and how quickly SQL Server can provide content to the SharePoint farm.

SharePoint can make use of three distinct caching mechanisms: ASP.NET output cache, Binary Large Object (BLOB) cache, and the object cache. Each of these caching mechanisms has a representative set of counters in Performance Monitor.


Image Exam Tip

Although these caching mechanisms all enhance performance in a SharePoint farm, know which are enabled by default and which you must enable manually. Each of these cache types can result in a shortage of resources in the SharePoint farm; know which resource types might be affected by each of the three cache types.


ASP.NET output cache counters

Output cache setting effectiveness can be monitored by viewing the values for the following ASP.NET Apps counter group in Performance Monitor (shown in Table 7-1).

Image

TABLE 7-1 Output cache counters and optimal values


Important

In PerfMon, the name of this counter is specifically listed as ASP.NET Apps, but is appended by the version number (for example, this might read ASP.NET Apps v.4.0.30319).


BLOB cache counters

BLOB cache setting effectiveness can be monitored by viewing the values for the SharePoint Publishing Cache counter group in Performance Monitor shown in Table 7-2.

Image

TABLE 7-2 BLOB cache counters and optimal values

Object cache counters

Object cache setting effectiveness can be monitored by viewing the values for the SharePoint Publishing cache counter group in Performance Monitor shown in Table 7-3.

Image

TABLE 7-3 Output cache counters and optimal values

Configure usage and health providers

Monitoring is an integral part of any IT administrator’s job; although this person is occasionally called on to perform reactionary maintenance (to fix things that go wrong during production hours), the lion’s share of duties should be focused on preventive administration. Monitoring logs and other metrics provided by the systems they support is a key component of long-term IT success.

SharePoint presents a special challenge from an administrative standpoint because it is dependent on a lot of other technologies such as SQL, IIS, ASP, the operating system, and SharePoint.

At any given time, a SharePoint administrator might need to know metrics such as these:

Image How well is IIS serving pages?

Image Are SharePoint farm member servers functioning correctly, from an operating system standpoint?

Image How are the data tier servers functioning with respect to the load placed on them by the SharePoint farm?

Add the monitoring of several SharePoint servers into the mix, and there are a lot of logs to be checked, especially for a smaller IT team. IIS, SharePoint, and SQL logs can be individually monitored, but each individual logging system paints only a partial picture of the health and well-being of a SharePoint farm.

Event selection

When you are configuring the usage and health data collection for the farm, you will be given the opportunity to choose from a series of events to capture. Several of these events are enabled by default, although you can choose to deselect them before enabling the usage and health data provider to enhance performance.

Table 7-4 shows a listing of events that can be logged along with their initial logging state.

Image
Image

TABLE 7-4 Potential logging events

Configuring usage and health data collection

A newly created SharePoint installation creates the usage and health data collection services, but does not activate or configure them by default. These services can affect performance; as a result, they should not be activated until after the farm is fully configured, but prior to user acceptance testing.

To display the usage and health data collection configuration, on the Monitoring page in Central Administration, in the Reporting section, select Configure Usage And Health Data Collection.


Image Exam Tip

Be familiar with the steps required to both enable and configure usage and health providers—specifically how to schedule the log collection and select the events being captured.


There are six major components to the configuration of usage and health data collection:

Image Usage Data Collection Choose to either enable or disable data collection (selected to be enabled by default).

Image Event Selection The selection of which events are to be captured within the logging database, as shown in the last section.

Image Usage Data Collection Settings Specifies the log file location on all SharePoint farm servers (defaults to %ProgramFiles%Common FilesMicrosoft SharedWeb Server Extensions16LOGS).

Image Health Data Collection Choose whether or not to enable health data collection settings and edit the health logging schedule (if necessary).

Image Log Collection Schedule Choose whether you want to edit the log collection schedule via the Usage Data Imports and Usage Data Processing timer jobs.

Image Logging Database Server Displays the current database server and name for the logging database along with the authentication method used to connect to SQL (Windows authentication or SQL authentication).


Important

The database server and name are intentionally unavailable; these values can be reconfigured via Windows PowerShell cmdlets.


After you have made your selections on this page, click OK to activate the usage and health data collection functionality.


Need More Review?

Obviously, special care should be taken to ensure that the logging database does not fill all its available space. The benefit of having operating system, SharePoint, and SQL counters captured in the logging database is to gather a complete picture of your farm and all its member servers (including database servers) for analysis as required. For more information on this functionality, see the TechNet article “Monitoring and maintaining SharePoint Server 2013” at https://technet.microsoft.com/library/ff758658.aspx.


Logging database functionality

As the SharePoint farm is being created, the WSS_UsageApplication (default name) database is created for use in logging performance metrics. Individual metrics, captured by each member server of the SharePoint farm, can be combined on a regular basis and stored in a series of partitioned tables within the logging database. This database also provides a series of predefined SQL views that can be used to produce output using Microsoft Excel.


Need More Review?

This logging database is unlike any other in SharePoint 2016 because it is the only one created for the express purpose of querying via SQL Server Management Studio (SSMS). For more details about the views and stored procedures present within the logging database, see the article “View data in the logging database in SharePoint 2013” at https://technet.microsoft.com/library/jj715694.aspx.


After the usage and health data collection has been configured in Central Administration, logged events are stored in a series of tables (partitioned by day), as shown in Figure 7-16. Each table has a total of 32 partitions, one for each possible day of a given month (Partitions 1 through 31) and another specifically intended to contain the current day’s logs (Partition0).

Image

FIGURE 7-16 WSS_UsageApplication database tables

There are three timer jobs responsible for the collection and aggregation of logging data in a SharePoint 2016 farm:

Image Microsoft SharePoint Foundation Usage Data Import This job runs every five minutes by default and imports usage log files into the logging database.

Image Microsoft SharePoint Foundation Usage Data Maintenance This job runs hourly and performs maintenance in the logging database itself.

Image Microsoft SharePoint Foundation Usage Data Processing This job runs once daily and expires usage data older than 30 days.

The usage data import timer job is fairly self-explanatory: All it does is extract logging data from every member of the farm and load this information into the logging database tables (by category) for further analysis. This information is temporarily stored in the _Partition0 table so logging information can be regularly added throughout the day. At day’s end, the usage data processing job accumulates and analyzes the current day’s log information, removing it from _Partition0 and storing it in one of the 31 different daily partitions.

As an example, if today were July 10 and you selected the top 1,000 rows from the dbo.AccessServicesMonitoring_Partition10 table, you would likely be seeing logs from June 10; today’s logs would still be stored in the _Partition0 table until the date rolls over to July 11. At that point, the logs for July 10 would be moved by the usage data processing timer job to the _Partition10 table, and the _Partition0 table would be reset.

Monitor and forecast storage needs

Predicting the storage requirements of SharePoint installation requires a combined effort from both farm and site collection administrators. Either of these can monitor storage and also record the data growth rate for predicting future database size requirements.

Although the farm administrator can often “drill down” to the same administrative level as the site collection administrator, his or her unfamiliarity with the data or its retention requirements makes the administration of storage at this level quite a bit more difficult. Likewise, the site collection administrator often has no insight into the available storage outside of a particular site collection.


Image Exam Tip

Capable monitoring of individual content databases is important, but understanding and addressing growth trends is even more important. Be extremely familiar with the process of moving site collections from one content database to another using PowerShell cmdlets, how to create a new content database attached to the same web application, and how to restrict the addition of new site collections to a content database that is already quite large.


Monitoring content databases

Central Administration does not provide a way for farm administrators to review the size of content databases. Windows PowerShell, on the other hand, provides a couple of different cmdlets for reviewing SharePoint databases.

Image Get-SPDatabase This is the more generic of the two commands, and it will retrieve information about all databases within a SharePoint farm: configuration, content, and service application.

Image Get-SPContentDatabase This focuses specifically on databases that possess SharePoint content.


Image Exam Tip

Be familiar with these two very similar sounding cmdlets and the differences in their output.


A single web application might have multiple content databases; using the Get-SPContentDatabase cmdlet along with the -webapplication switch displays all the content databases associated with a particular web application (http://departments.wingtiptoys.com, shown in Figure 7-17).

Image

FIGURE 7-17 Retrieving content databases with the -webapplication switch

It’s also possible to obtain the size of any individual database by assigning a variable to the Get-SPContentDatabase cmdlet along with the name of the individual content database and then querying the disksizerequired property to get the size in bytes:

$CDb = Get-SPContentDatabase -id <databasename>

$CDb.disksizerequired

This number can be divided by 1 GB to get the size in gigabytes:

$CDb.disksizerequired/1GB


Image Exam Tip

The need for a system administrator to understand the properties of an object in PowerShell is more important than ever. Be familiar with the syntax required to both retrieve and alter these properties.



Important

There are certainly more sophisticated ways of retrieving the size of each content database, even grouping them by their associated web applications and URLs. This example was meant to be a very basic walkthrough.


Monitoring individual site collections via Windows PowerShell

Retrieving the size of a single site collection in Windows PowerShell is much less complicated than retrieving the size of an entire content database. The Get-SPSite Windows PowerShell cmdlet can be used along with a site collection’s URL to retrieve its information. The size of the site collection is returned by using the usage property, as shown here:

$site = get-spsite -identity http://departments.wingtiptoys.com

$site.usage

Monitoring site collection content

Site collection administrators can monitor the consumption of storage within their respective site collection by using the new Storage Metrics page. This page is located in the Site Collection Administration section of Site Settings, and shows a graphic representation of all content within the current site collection (see Figure 7-18).

Image

FIGURE 7-18 Storage metrics for a site collection

This report is far from being one dimensional; from here, a site collection administration can drill down into the content of an individual site collection, retrieving its storage metrics.

Monitor SharePoint hybrid cloud deployments

SharePoint hybrid is the sum of two different systems and the component functionality that binds them together. Items that might be hybridized between the two environments currently include Cloud Hybrid Search, OneDrive for Business, Hybrid Team Sites, Business Connectivity Services (BCS),, and Hybrid profiles.

Up to this point, monitoring of these two systems has been done individually:

Image On-premises SharePoint installations can be monitored by using either out-of-the-box toolsets (ULS logs, event logs, and so on) or more advanced components, such as Systems Center Operations Manager 2012 or later.

Image SharePoint Online installations have had little in the way of monitoring from a systems standpoint.

Microsoft Insights and Telemetry services will allow the SharePoint administrator to effectively monitor both on-premise and online versions of SharePoint, providing both systems-based and user analysis in one interface.


Important

At the time of publication for this Exam Ref, Microsoft SharePoint Insights has not yet been rolled out for use in a SharePoint 2016 hybrid environment. Once this functionality is made available, the supporting documentation will be found in the TechNet article entitled “Microsoft SharePoint Insights” at https://technet.microsoft.com/library/86e0fc90-0ef8-4c22-9d3b-7af42bf882f1.


Skill: Tune and optimize a SharePoint environment

Creating an effective SharePoint environment isn’t a one-size-fits-all task. A careful examination of how the farm is intended to be used often exposes perceived weaknesses in the original design requirements. Add to that any requirements changes placed on the system by the user base, and you have a situation that is ripe with tuning potential.

The tuning and optimization portion of your project is the chance for you to tweak the underlying configuration of the farm, enabling you to both enhance performance metrics and avoid any limitations placed on the system by its original design.

Plan and configure SQL optimization

SharePoint administrators occasionally assume the role of itinerant SQL administrators for one of two reasons:

Image Because the performance of the data tier directly affects the performance of an entire SharePoint farm, and SharePoint has very specific SQL requirements.

Image If the organization has an SQL database administrator, there is an off chance that they might have never before been required to support the data tier of a SharePoint environment.

Knowing the behaviors, maintenance requirements, and performance characteristics of your content, configuration, and service application databases enables you to more clearly relate your desired strategy for long-term performance and growth goals to the SQL team.

Choosing a storage type

The type of storage configuration chosen for the SQL data tier will have a direct bearing on the performance of the completed SharePoint farm. In a server-based environment, storage is most often attached either directly to the server using direct attached storage (DAS) or attached via a storage area network (SAN). In either type of storage implementation, the way in which these disks are organized and grouped together can have a direct bearing on performance.


Need More Review?

Network attached storage (NAS) is supported only for Remote Blob Storage (RBS) in SharePoint 2016, and then only if the time to first byte in a response is less than 40 ms for more than 95 percent of the time. For more information about supported limits in SharePoint 2016, review the TechNet article entitled “Software boundaries and limits for SharePoint Server 2016” at https://technet.microsoft.com/library/6a13cd9f-4b44-40d6-85aa-c70a8e5c34fe(v=office.16).


Within a SharePoint farm, the design of the farm and which service applications or functions are to be implemented determine what type of storage arrangement should be chosen. There are several configuration items that should be considered:

Image Is the service application or function more sensitive to read or write speeds? Is there a balance to be had between the two?

Image On an Internet site on which people consume a lot of data, but that does not change often, you might want to focus on read speeds for your storage.

Image For a series of collaboration sites in which the data itself is changing on a regular basis, you might choose to balance read and write speeds.

Image For a service application such as Search or for the TempDB of an SQL instance, you might want to focus on write speeds for your storage.

Image Is the storage mechanism chosen appropriate for the content being stored?

Image A RAID-5 storage array containing five 600-GB drives could store 2.4 TB of data (600 GB would be lost to maintain data parity). This array would be capable of withstanding a single drive failure and would have excellent read characteristics but less than optimal write performance characteristics.

Image A RAID-10 array using the same drives would require a total of eight 600-GB drives to provide the same amount of storage (2.4 TB for the data; 2.4 TB mirrored). This array would theoretically be able to withstand the failure of up to four drives (but only one per mirror set), but would offer superior read and write performance characteristics.


Important

RAID configurations are discussed in the next section.


RAID configuration levels

Redundant array of independent disks (RAID) is a technology that uses either hardware or software to group and organize hard drives into one or more volumes. The RAID level chosen can meet one of two possible objectives:

Image Redundancy Allows for the grouping of drives to be able to withstand the failure of one or more individual drives within the group.

Image Performance Altering the arrangement and configuration of drives within the group results in performance gains.


Important

Although you can choose to use software RAID in a production SharePoint farm, it is not recommended because the operating system of the server that hosts the storage is itself responsible for maintaining the RAID configuration. This maintenance consumes both memory and processor resources on the host system.


There are four RAID levels commonly used within the storage subsystems of a SharePoint farm (particularly within the data tier): 0, 1, 5, and 10:

Image RAID Level 0 (striping) This array type distributes the reads and writes across multiple physical drives (or spindles).

Image Performance This arrangement offers the absolute best performance for both reads and writes in your storage subsystem.

Image Redundancy This arrangement has absolutely no tolerance for any individual drive failures; if a single drive fails, the entire array is destroyed.

Image RAID Level 1 (mirroring) This array type uses an identical pair of disks or drive sets to ensure redundancy.

Image Performance This arrangement offers the same read performance as RAID Level 0 (assuming the same number of physical disks or spindles), but write speed is reduced as the number of input/output (I/O) write operations per disk is doubled.

Image Redundancy This arrangement can withstand the failure of a single drive or an entire drive set.

Image RAID Level 5 (block level striping with distributed parity) This array type distributes reads and writes across all drives, but also writes parity in a distributed fashion across all drives.

Image Performance This arrangement offers the same read performance as RAID Level 0, but it incurs a fairly steep write penalty as the parity operation increases write overhead.

Image Redundancy This arrangement can withstand the failure of a single drive within the drive set.

Image RAID Level 10 (striped mirror) This array type (known as a nested or hybrid RAID) combines RAID Levels 0 and 1 together, providing a high-performance and high-resiliency drive arrangement.

Performance prioritization

Within an SQL data tier, there are four distinct groupings of databases and files you should consider from a performance perspective. The assignment of these groupings to different storage types can have a dramatic effect on performance.

Although you could theoretically put all your databases on RAID-10 disk sets, doing so could be wasteful from a cost standpoint, providing limited benefit in some cases. Conversely, assigning write-heavy databases to a RAID-5 disk set would result in a heavy performance penalty.

From a performance standpoint, then, the four groupings of storage to consider are (in terms of priority from highest to lowest) as follows:

Image TempDB files and transaction logs

Image If possible, assign these to RAID-10 storage.

Image Allocate dedicated disks for TempDB.

Image Number of TempDB files should be equal to the number of processor cores (hyperthreaded processors should be counted as one core).

Image All TempDB files should be the same size.

Image An average write operation should require no more than 20 ms.

Image Database transaction log files

Image These should be on a separate volume from the data files.

Image If possible, assign them to RAID-10 storage.

Image These are write-intensive.

Image Search databases

Image If possible, assign these to RAID-10 storage.

Image These are write-intensive.

Image Database data files

Image These can be assigned to RAID-5 storage with the understanding that writes might be slower (for better performance, consider using RAID-10 storage).

Image These are read-intensive, especially useful for Internet-facing sites.

Pregrowing content databases and logs

Pregrowth is the act of preemptively growing a content database (or its associated log file) to a designated initial size. This size can be an estimate of how big you might expect the database or log to grow because the database administrator (DBA) can also shrink the files somewhat if not all of the space is eventually used.

Pregrowing a database has two benefits:

Image A reduction in overall I/O A database that has not been pregrown or grows in very small increments has to be expanded every time data is added, resulting in additional I/O load on the disk array and server.

Image A reduction in data disk fragmentation Small or frequent incremental data growth can result in defragmentation, reducing performance.


Important

The SQL model database (part of the system databases) can be used to control the initial size of newly created content databases. If you choose to do so, just remember that any new service application, configuration, or other databases will all echo the model database’s initial size. Often, SharePoint administrators will configure the entire farm first (Central Administration/Configuration/Service Applications) and then alter the model database prior to creating content databases.


Pregrowing the database is done from within SQL Server Management Studio, as there is no way to configure initial database size from within Central Administration. Altering the initial size of a database is done by simply opening the properties of the content database and choosing a new initial size number, as shown in Figure 7-19.

Image

FIGURE 7-19 Altering the Initial Size setting of a database (pregrowth)

Configuring content database autogrowth

Autogrowth is the amount at which a database or its log grows after it reaches its current size limit. When a SharePoint content database is created from within Central Administration, its default autogrowth rate is set to 1 MB.

Such a configuration is far from ideal; if you had a database that was at its current size limit and you added a 10 MB file, the database file would have to be grown a total of 10 times before it would have enough room to store the file. Imagine how much I/O this could generate multiplied over 1,000 files!


Important

The SQL model database (part of the system databases) cannot be used to control the autogrowth rate of newly created content databases. Get in the habit of altering the initial size and autogrowth metrics for a content database after its initial creation.


Configuring the autogrowth number for a database is done from within SQL Server Management Studio, as there is no way to configure this growth from within Central Administration. Altering the autogrowth of a database is done by simply opening the properties of the content database and choosing a growth number (in MB) or growth percentage, as shown in Figure 7-20.

Image

FIGURE 7-20 Altering the autogrowth values for a database


Important

It is never a good idea to limit the maximum file size of a SharePoint content database or its associated transaction log. Doing so can have unintended results and appear as an error to your users if the database attempts to exceed a hard limit.



Image Exam Tip

Choosing to either adjust autogrowth rates or pre-grow a database might trigger the SharePoint Health Analyzer rule called Database, which has large amounts of unused space. This message can be safely ignored because you intend to eventually fill the available space with data.


Advanced content database performance

As content databases grow, their overall size can cause performance degradation. Depending on the content present in the database, separating the database into multiple smaller content databases might be impractical.

One of the potential solutions to this issue is to split a larger content database file into multiple smaller files that are still part of the same database. If you decide to go with this approach, spreading these files across separate physical disks could result in performance improvements, due to better I/O metrics.

As is the case with the TempDB database, the number of data files for any split content database should be less than or equal to the number of processor cores present on the database server. If hyperthreaded processors are used, each should be counted as a single core.


Important

Choosing to split content databases across multiple database files has a side effect where SharePoint administration is concerned: SharePoint Central Administration cannot be used to back up and restore a content database that is comprised of multiple files. After they are split, the databases must be backed up and restored from SQL Server because SharePoint specifically does not understand how to restore multiple files to the same content database.


Implement database maintenance rules

Errors or inconsistencies in the data tier of a SharePoint farm environment can have dramatic effects on the performance of the farm as a whole. Regular maintenance from a database standpoint results in a more stable and better performing SharePoint experience for your users, often resulting in performance gains without the need for additional equipment or reconfiguration.

Health Analyzer rules

In previous versions of SharePoint, it was often necessary for either the SharePoint administrator or SQL DBA to perform this behind-the-scenes maintenance from within SQL Server Management Studio. Fortunately, the newer versions of SharePoint have new Health Analyzer rules that address both defragmentation and statistics maintenance, removing these administrative tasks as regular maintenance items.

A Health Analyzer rules definition can be found in Central Administration in the Monitoring, Health, Review Rule Definitions menu. Within the Performance section, there are three rules that are all enabled and set to automatically repair any related issues on a daily basis (see Table 7-5).

Image

TABLE 7-5 Health rules for database indexing and statistics

Plan for capacity software boundaries

Boundaries are absolute limits within SharePoint that were created by design and cannot be exceeded. Although these boundaries are few in number when compared with the sheer quantity of options and settings available, they shape the design of a SharePoint infrastructure.

This boundary structure is present in many of the logical components of a SharePoint farm. Although not present in each level, boundaries exist at the following levels:

Image Web applications

Image Content databases

Image Site collections

Image Lists and libraries

Image Search

Image Business Connectivity Services

Image Workflows

Image PerformancePoint Service

Image Word Automation Service

Image Office Online Service

Image Project Server

Image SharePoint Add-ins

Image Distributed Cache Service

Image Miscellaneous limits


Need More Review?

For more detail on the boundaries, supported limits, and thresholds present in any given hierarchical level, see the TechNet article “Software boundaries and limits for SharePoint Server 2016” at https://technet.microsoft.com/library/cc262787(v=office.16).aspx.



Image Exam Tip

Although there are dozens of supported limits and boundaries in SharePoint, there are a few that every administrator should know by heart. Items such as maximum file size, zones in a farm, and crawl document size limits are all good metrics to be familiar with.


Estimate storage requirements

Because SharePoint is heavily dependent on SQL for its storage needs, the proper allocation of storage resources is a critical design element for the SharePoint farm. This design can be broken down into two major components, storage and I/O operations per second (IOPS).

Storage variables

Storage is simply the amount of available space configured for a particular database. If the database happens to be a content database, the overall size of the database can vary dramatically based on two features: recycle bins and auditing.

Recycle bins are enabled by default at both the site (web) and site collection (site) levels. A document that is deleted from a site occupies space in the associated content database until it is deleted from both the first- and second-stage recycle bins. If you foresee the need to delete many documents in the interest of reclaiming space, the documents must be deleted from both recycle bins.

Auditing can place a lesser storage demand on a content database. If you expect to use auditing in a particular content database, try to restrict the levels at which it is enabled rather than enabling auditing on entire site collections.


Image Exam Tip

Recycle bins are some of the most straightforward and most misunderstood components in SharePoint. Knowing and understanding how a document moves from one stage recycle bin to another is key to understanding how documents that are “hidden” might be consuming space.


I/O operations per second

IOPS is the measure of how many input and output operations per second are available from your I/O subsystem (storage). The storage configuration influences both the read and write speeds available for use.

Stress testing a storage subsystem enables you to know the limits of your storage and also gives you an opportunity to tune it to your requirements. There are three main tools used for this purpose (see Table 7-6), each of which is free of charge.

Image

TABLE 7-6 I/O subsystem testing tools

Estimating configuration storage and IOPS requirements

The SharePoint configuration database and Central Administration content database have fairly meager storage requirements. Both databases use a negligible amount of space; you can safely plan for less than 1 GB each for the configuration database and Central Administration content database storage. Although the configuration database itself will not grow to a great degree, the supporting transaction log can grow to be quite large and should be backed up (for truncation purposes) on a regular basis.

Estimating service application storage and IOPS requirements

Service applications vary wildly in terms of storage and IOPS requirements. The largest consumer of service application resources is Search, consuming the lion’s share of available storage and IOPS resources. At the other end of the scale are the State, Word Automation, and PerformancePoint Service applications, each of which requires minimal IOPS and approximately 1 GB of allocated storage.


Need More Review?

Detailed information concerning performance and scaling metrics is given in the TechNet article “Storage and SQL Server capacity planning and configuration (SharePoint Server 2016)” at https://technet.microsoft.com/library/cc298801(v=office.16).aspx.


Plan and configure caching and a caching strategy

Caching within SharePoint 2016 is an effective mechanism for increasing the performance of page and content delivery to the requesting user. As stated earlier in this chapter, SharePoint uses a combination of three distinct technologies to deliver this enhanced performance: ASP.NET output cache, BLOB cache, and object cache.


Important

Some of the following configurations involve altering the Web.config of a web application. When this file is saved after having been changed, it automatically recycles its associated web application, potentially disrupting service to your users. It is advisable to make these configuration changes after hours.


Planning and configuring the ASP.NET output cache

The output cache present in SharePoint 2016 stores several different versions of a rendered page; these versions are permissions dependent, based on the permissions level of the person who is attempting to view the page. Settings for this cache can be configured at the site collection and site levels, and also configured for page layouts. Additionally, the Web.config file for a web application can be altered with the output cache profile settings; these settings will then override any settings made at the site collection level (or below).


Important

SharePoint implements the ASP.NET output cache, but refers to it simply as the page output cache in most site settings menus; from this point forward, we’ll use this reference.


Cache profiles

Prior to enabling the page output cache, you can review the site collection cache profiles that will be used in the output cache in the site settings of your site collection. In Site Settings, select the Site Collection Administration section; under Site Collection, select Cache Profiles.

Four profiles exist by default:

Image Disabled Caching is not enabled.

Image Public Internet (Purely Anonymous) Optimized for sites that serve the same content to all users with no authentication check.

Image Extranet (Published Site) Optimized for a public extranet in which no authoring takes place and no Web Parts are changed by the users.

Image Intranet (Collaboration Site) Optimized for collaboration sites in which authoring, customization, and other write-intensive operations take place.

You can also create a new cache profile if none of these suits your needs.


Image Exam Tip

The use of the page output cache requires that the Publishing Infrastructure feature be active for the site collection and that the Publishing feature be active for the particular site. After the Publishing feature is enabled, so, too, is the output cache (using default settings).


Enabling the page output cache (web application level)

Enabling the page output cache at the web application level overrides all other page output cache settings at the site collection, site, or page layout levels.

To enable the page output cache, follow these steps:

1. Open Internet Information Services (IIS) Manager and select the Website that you want to configure.

2. Select Web.config and then open with the editor of your choice.

3. Search for the OutputCache Profiles XML entry:

<OutputCacheProfiles useCacheProfileOverrides="false" varyByHeader=""
varyByParam="*" varyByCustom="" varyByRights="true" cacheForEditRights="false" />

4. Change the useCacheProfileOverrides attribute from false to true, then save and close the Web.config file.


Important

Saving this change will result in an outage while the site is restarted in IIS.



Image Exam Tip

Although setting the page output cache at the web application level is highly effective, changes made at this level have to be made on the Web.config files of each front-end server and should be included in your farm documentation. Unless there is a compelling reason not to, it is recommended to instead enable configuration of the page output cache at the site collection level, which requires no system outage for additional changes.


Enabling the page output cache (site collection level)

Enabling the page output cache within a publishing site collection is done within the Site Collection Administration menu.

1. In Site Settings, select the Site Collection Administration section and then select Site Collection Output Cache.

2. In the Output Cache section, choose to enable or disable the output cache.

3. In the Default Page Output Cache Profile section, you have the opportunity to choose from the cache profiles mentioned earlier:

A. For the Anonymous Cache Profile: Choose from Disabled, Public Internet, Extranet, or Intranet.

B. For the Authenticated Cache Profile: Choose from Disabled, Extranet, or Intranet.

4. Page Output Cache Policy enables you to delegate control of the cache policy:

A. Whether publishing subsite owners can choose a different page output cache profile.

B. Whether page layouts can use a different page output cache profile.

5. Debug Cache Information (optional) enables you to enable debug cache information on pages for troubleshooting cache contents.

Enabling the page output cache (site level)

If previously delegated by the site collection administrator, page output cache settings can be configured at the subsite level from the Site Administration menu.

1. In Site Settings, select the Site Administration section and then select Site Output Cache.

2. On the Publishing Site Output Cache page, you can choose the Page Output Cache Profile:

A. Anonymous Cache Profile can either inherit the parent site’s profile or select a profile (Disabled, Public Internet, Extranet, or Intranet).

B. Authenticated Cache Profile can either inherit the parent site’s profile or select a profile (Disabled, Extranet, or Intranet).

3. Optionally, you can select the check box to apply these settings to all subsites.

Enabling the page output cache by page layout

If previously delegated by the site collection administrator, page output cache settings can be configured on a per-page layout basis from the Master Pages And Page Layouts menu.

1. In Site Settings, in the Web Designer Galleries section, select the Master Pages And Page Layouts section.

2. On the Master Page Gallery page, choose a page layout and then select its drop-down menu.

3. After selecting the Edit Properties value, you are presented with the Properties page. Scroll down to the bottom and you can select either or both authenticated or anonymous cache profiles.

4. On the ribbon, in the Commit section, click the Save icon to close the settings.

Planning and configuring the BLOB cache

The BLOB cache is used to prestage branding (.gif, .jpg, .css, .js), image, sound, video, and other files that are stored in SQL as BLOBs. This is a disk-based caching technique that stores these items on the web tier servers within your farm.

The purpose of storing these items on the web tier is to directly benefit from not having to retrieve these larger files from the content databases stored on the SQL data tier. This caching mechanism is enabled or disabled on each web tier server on a per-web application basis by modifying Web.config and adding the following XML:

<BlobCache location="{drive letter}:BlobCache16" path=".(gif|jpg|jpeg|jpe|jfif|bmp|d
ib|tif|tiff|themedbmp|themedcss|themedgif|themedjpg|themedpng|ico|png|wdp|hdp|css|js|as
f|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv|ogg|ogv|oga|webm|xap)$" maxSize="10"
enabled="false" />

There are a few settings in this piece of XML that are of interest.

Image Although both the location and file folder details can be changed, the change should be uniform on all web tier servers.

Image The path item does not indicate the path on the file system, but instead the types of files (BLOB) that can be stored on the file system.

Image The maxSize entry indicates the size in gigabytes (GB) for the BLOB cache; any changes to this value should be made uniformly on all web tier servers.

Image The maxSize value should never be less than 10 GB, but can (and should) be grown to roughly 20 percent bigger than the expected BLOB content.

Image Changing the enabled value from false to true activates the BLOB cache.

Planning and configuring the object cache

The object cache in SharePoint 2016 is used to store objects in the memory of the front-end SharePoint farm servers, thus reducing the amount of traffic between these servers and the SQL data tier. These objects—which include lists and libraries, site settings, and page layouts—are used by the Publishing feature when it renders webpages on the site.


Important

The use of the object cache requires that the Publishing feature be active on your site. After the Publishing feature is enabled, so, too, is the object cache (using default settings).


Activating the object cache

The object cache relies on a series of settings, which can be found at Site Settings, Site Collection Administration, Site Collection Object Cache:

1. In Site Settings, select the Site Collection Administration section and then select Site Collection Object Cache.

2. In the Object Cache Size section, specify the maximum cache size in MB (default is 100 MB). Remember that this cache space comes directly out of RAM of each server in your web tier.

3. In the Object Cache Reset section, you will normally leave these values cleared. From here, you can not only flush the object cache of the current server (by selecting Object Cache Flush) but also that of the farm (by selecting Force All Servers In The Farm To Flush Their Object Cache).

4. In the Cross List Query Cache Changes section, you can configure the behavior of cross list queries, such as Content Query Web Parts. You have the choice of either precaching the results of such a query for a specified period of time (the default is 60 seconds) or forcing the server to check for changes every time a query is performed (which is more accurate from a results standpoint, but results in slower performance).

5. In the Cross List Query Results Multiplier section, you can choose a multiplier value ranging from 1 to 10 (3 is the default). This number should be increased if your site has unique security applied to many lists or libraries, but it can also be reduced if your site does not have as many unique permissions. A smaller multiplier uses less memory per query.


Important

The object cache size can also be controlled at the web application level by altering the Web.config <ObjectCache maxSize=“100” /> line.


Tune network performance

Although there are significant networking improvements present in both the Windows Server 2002 and 2016 platforms, some minor alterations to your SharePoint 2016 network environment can result in significant performance gains.

Domain controllers and authentication

A SharePoint Server 2016 farm can potentially place a significant authentication load on domain controllers (DCs) within your network. As general guidance, Microsoft recommends that you deploy a new DC per every three web tier servers present in your SharePoint farm.


Important

It should be noted that the DC for this task should not be a read-only DC.


Separating client and intrafarm network traffic

A SharePoint Server 2016 environment can start on only a single server and grow into an environment consisting of several servers in a MinRole environment. Depending on the arrangement chosen, a SharePoint environment could potentially have client and intraserver traffic traversing the same interfaces, potentially disrupting or slowing services for SharePoint users.

Consider installing two network adapters on each front-end server:

Image One connected to a client subnet/virtual local area network (VLAN) that serves client requests.

Image The other connected to an intraserver subnet/VLAN that enables interserver connectivity.


Image Exam Tip

Network administration is not a core requirement for being a SharePoint administrator; however, knowing how concepts such as subnets and VLANs can work to separate client and server communications might be key to understanding a very simple way to improve SharePoint connectivity and performance.


Plan and configure Zero Downtime Patching

In Chapter 1 we discussed how an environment could be made resilient using at least two servers for each role (three for the distributed cache service), as shown in Figure 7-21. Designing a farm with resiliency in mind not only improves farm availability, but also provides the opportunity to maintain the farm without the need to provide an outage window.

Image

FIGURE 7-21 Designing an environment for resiliency using a five tier server farm.

In SharePoint 2013, it was possible to approximate a zero downtime schedule for software updates to SharePoint 2013. SharePoint update binaries could be applied to a server that had been temporarily removed from rotation; when added back in, that server would continue to run in backward compatibility mode. This process could be completed on each SharePoint server in the farm, resulting in no downtime.

Once the binaries had been installed to the SharePoint farm, the next step was to update the farm as a whole. At this point, using the SharePoint Configuration Wizard would not be the preferred option, as it would upgrade content databases sequentially; instead, using multiple instances of the Upgrade-SPContentDatabase cmdlet with the -UseSnapshot option would allow you to upgrade the databases in parallel (shortening the overall upgrade window) and allowing read-only access to the snapshot.

During this process, the SharePoint farm is available from a read-only standpoint during the upgrade, but cannot be used in a read-write capacity until the process is complete; thus, it doesn’t technically meet the definition of Zero Downtime.


Need More Review?

The -UseSnapshot option will not be used in SharePoint 2016 Zero Downtime operations, as it places the content databases in a read-only state. For more information, review the TechNet blog article by Stefan Goßner called “SharePoint Server 2016 Zero-Downtime Patching demystified” at https://blogs.technet.microsoft.com/stefan_gossner/2016/04/29/sharepoint-2016-zero-downtime-patching-demystified/.


Preparing for a Zero Downtime update

Given the resiliency requirements for this process, it is recommended that an inspection of the current farm status is in order before proceeding. The following upgrade conditions should be met:

Image All front-end servers should be in a functioning load balancer rotation.

Image During the upgrade process, the SharePoint farm administrator will need to work closely with the administrator of any external load balancer appliance.

Image All farm servers should be operating properly.

Image Perform checks such as reviewing any errors in Event Viewer and determining if enough disk space is available.

Image For the Search role servers, use the Get-SPEnterpriseSearchStatus cmdlet or the Manage Service Applications menu within Central Administration to review the status of the Search service application.

Image All databases should be active and operating properly.

Image Make sure that databases are set to read and write in SharePoint and that they are online.

Image Check the Health Analyzer for any database orphans.

Image Notify the SQL DBA that your upgrade might increase I/O demands on both the SQL and storage (SAN) subsystems.


Important

Never proceed with any upgrade (whether it’s build-to-build or version-to-version) on a farm that is in an inconsistent state.


Using the in-place upgrade method (with backward compatibility)

In a Zero Downtime cycle, the Web, App, and Distributed cache tier servers are individually removed from rotation, patched, and then returned to service. Search is also included in this cycle, but it is the one component that does have a minimal downtime window (while the Search service application is suspended).

The Zero Downtime cycle is broken into two major phases:

Image Update phase During this phase, the contents of the SharePoint update are installed to each server in the farm, in a sequence that starts with the Front-end servers and continues through Application, Search, and Distributed cache servers.

Image Upgrade phase During this phase, member servers in the farm are upgraded, starting with the Application server hosting Central Administration and continuing through the Front-end, Search, and other Application servers.


Important

Neither the Configuration Wizard nor the Upgrade-SPContentDatabase cmdlet should be run in the Update phase.



Need More Review?

The process for updating an entire SharePoint 2016 farm using Zero Downtime Patching can be complex, depending on the roles you’ve enabled. For an in-depth understanding of exactly what needs to occur during this process, review the TechNet article entitled “Install a software update for SharePoint Server 2016” at https://technet.microsoft.com/library/ff806338(v=office.16).aspx.


Skill: Troubleshoot a SharePoint environment

Now that users and departments are being migrated to the SharePoint environment, the focus of the SharePoint administrative team changes to the monitoring and troubleshooting of the new environment. Monitoring thus far has been a reactionary tactic, as production SLAs might not yet be in place; as a result, the current user load (and any remaining configuration items) have the potential to create issues that will require attention.

Administering the new farm from a proactive versus reactive standpoint is critical at this juncture. Establishing a baseline for how the farm will perform provides the SharePoint administrative team with the ability to quickly identify any performance or configuration anomalies before they become issues observed by the user base as a whole.

Establish baseline performance

Immediately after your staging environment (or production, if you don’t have the resources for staging) has been finalized, you have an opportunity to define the baseline performance of your SharePoint environment.

At this point, we are not talking about performance testing (what the system is capable of when it is working at maximum capacity), but rather what the nominal or expected operation of the system looks like from a logging standpoint. The optimal goal is to have a statistical sampling of what the environment looks like as it adjusts to varying levels of user demand on a regular interval. For instance, odds are that a system is much busier at 9:00 on Monday morning than it is at 9:00 on Saturday night (assuming a standard work week).

Baselining your SharePoint environment

A core SharePoint 2106 MinRole-compliant farm includes Front-end, Application, and Distributed cache servers. Moving beyond the core configuration, farm services can be provided by servers that perform Search, Workflow Manager, Office Online, and Custom roles within the farm.

Performing a baseline capture of the chosen architecture prior to releasing it to production status allows the administrator to glean some understanding of how each of the service configurations performs in the completed farm, both from a duration and initial capacity standpoint.


Important

The baseline monitoring changes here are not intended to be left running for an extended period of time, as they can be resource intensive from a processor, memory, and storage standpoint. Use these items to establish a baseline for the farm, then return the configured values to their normal settings in the early production stages of the SharePoint 2016 farm.


Configuring monitoring

Monitoring a SharePoint 2016 farm can use a combination of settings from the Monitoring section within Central Administration:

Image Diagnostic Logging Diagnostic Logging can be used to select the categories and severity of events that will be recorded to the Event log and Trace log. At this point, Event log flooding protection should be disabled, after first ensuring that enough disk space is available where the Event log is stored.

Image Timer Job Scheduling The Timer Jobs, Job Definitions menu can be used to ensure that the Microsoft SharePoint Foundation Usage Data Import is scheduled to run every five minutes.

Image Diagnostic Providers While still in the Job Definitions menu, all following Diagnostic Data Providers should be enabled (except for SQL Blocking Queries, which is already enabled by default). The schedule for the Performance Counters—Database Servers and Performance Counters—should be already set to run every minute.

Image Usage Data Collection On the Configure Usage And Health Data Collection page, you can choose to enable usage data collection (this should be enabled for monitoring purposes), and in the Events To Log section, the following items should be selected: Content Import Usage, Content Export Usage, Page Requests, Feature Use, Search Query Use, Site Inventory Usage, Timer Jobs, and Rating Usage.


Need More Review?

In addition to the SharePoint items that can be recorded in the usage database, performance counters can be added from both an operating system and SQL standpoint using the Add-SPDiagnosticsPerformanceCounter cmdlet. For a clearer understanding of how monitoring is configured for SharePoint, review the TechNet article “Monitoring and maintaining SharePoint Server 2013” at https://technet.microsoft.com/library/ff758658.aspx.


Perform client-side tracing

Trace logs, which contain information such as stack traces and informational messages, are available on both Windows clients and Windows servers. The example shown here configures the only SharePoint functionality currently available for a client-side trace: BCS.


Important

Client-side tracing using PerfMon is only available in Windows Vista, Windows 7, Windows 8, and Windows 8.1 clients. Windows 10 does not have this functionality, which has largely been replaced by the Developer Dashboard.


Enabling a new client-side trace

Trace logging is not enabled by default, but it can be activated from within PerfMon by generating a new data collector set on a Windows client machine. Trace logging can have an effect on performance, so it is recommended that this functionality not be enabled unless required for troubleshooting efforts.

To perform client-side tracing, follow these steps:

1. Run Perfmon.exe.

2. In the left pane, expand the Data Collector Sets section and then right-click User Defined.

3. From the New menu, select Data Collector Set.

4. When the Create New Data Collector Set dialog box opens, enter a name for the set and select Create Manually (Advanced). Click Next.

5. On the What Type Of Data Do You Want To Include? page, leave the Create Data Logs option selected and select the Event Trace Data check box. Click Next.

6. On the Which Event Trace Providers Would You Like To Enable? page, select Add.

7. In the Event Trace Provider dialog box, scroll down and select the Microsoft-Office-Business Connectivity Services event trace provider and then click OK.

8. Returning to the Which Event Trace Providers Would You Like To Enable? page, verify that your new provider is shown and then click Next.

9. On the Where Would You Like The Data To Be Saved? page, you can choose a new location or leave the default. Make a note of this location and then click Finish.

Running the new client trace

Now that the trace has been configured, it is available to run as desired from within PerfMon, as follows:

1. Run Perfmon.exe.

2. In the left pane, expand Data Collector Sets and then expand User Defined, selecting your recently configured trace.

3. Right-click your data collector set and click Start.

4. Perform the BCS activities for which you want to capture trace data.

5. After you complete your activities, stop the trace by right-clicking the data collector set and selecting Stop.

Reviewing the client trace results

The results of the client trace can be reviewed from within Event Viewer. Follow these steps:

1. Run Eventvwr.exe.

2. On the Action menu, select Open Saved Log.

3. In the Open Saved Log dialog box, navigate to the location in which you specified the data should be saved when you created the data collector set.

4. Within this folder, you see one or more folders. The name of each of these subfolders begins with the machine name and then the year, month, and date in which the trace was performed. Expand this folder.

5. Within this folder is an .etl file. Open this file in Event Viewer.

6. Correlation (activity) IDs are generated on both the server and the client when items are created, updated, or deleted in external data. The Correlation ID column might not appear by default.

7. To display the Correlation ID column, on the View menu, select Add/Remove Columns.

Perform server-side tracing

Server-side tracing is captured within the trace log on the SharePoint Server. Used in conjunction with client-side tracing, it is possible to watch a particular activity from both the server’s and the client’s point of view.

Continuing with the previous example, the logging of BCS is already enabled on the SharePoint farm. To verify its logging trace level, do the following:

1. Open Central Administration and select Monitoring.

2. In the Reporting section, select Configure Diagnostic Logging.

3. Expand the Business Connectivity Services entry and ensure that Business Data is set to at least the Medium trace level.

4. Analyze the ULS log entries, looking for information about two categories:

Image BDC_Shared_Services

Image SS_Shared_Service

Analyze usage data

Using the built-in views provided in the logging database, you can review the different metrics that are captured in the SharePoint usage and health collection intervals. This information not only can be viewed in SSMS, but can also be exported as a comma-separated value (.csv) file to Microsoft Excel for further analysis.


Image Exam Tip

SharePoint farm administrators are becoming more and more versatile. One of the key tool sets we are learning to master is the simple SQL Server Management Services tool. Understand how to connect to a server, run a simple query, and view the result.


To begin viewing logging data in SSMS, do the following:

1. Open SSMS and connect to the SQL instance providing data services to your SharePoint farm (see Figure 7-22).

Image

FIGURE 7-22 Connecting to SQL Server

2. Select the logging database (WSSUsageApplication, by default) and then select the plus (+) sign to show all its components. Expand Views and then select a view from which you’d want to collect information (dbo.FileUsage is shown in Figure 7-23).

Image

FIGURE 7-23 The dbo.FileUsage view

3. Right-click the view and choose Select Top 1000 Rows. The SQL Query appears in the top pane, and the lower pane shows the query results (see Figure 7-24).

Image

FIGURE 7-24 Query results for the selection

These results can now be exported to a .csv file for viewing and analysis in Microsoft Excel by choosing Save Results As in the Results window at the bottom of the console. Opening and importing the .csv file in Excel enables you to display and interact with the data (see Figure 7-25).

Image

FIGURE 7-25 The results shown in an Excel file

Enable a developer dashboard

The Developer Dashboard is a tool that can be used to analyze the performance of your SharePoint pages. When enabled, this tool can be used by anyone with the Add and Customize Pages permission level (or greater).

The Developer Dashboard now appears in its own browser window, making it easier to interact with and navigate to the desired SharePoint page while still providing a dedicated view into the performance of that page.

Developer Dashboard settings

In SharePoint 2016, three properties still exist in Windows PowerShell (Off, On, and On Demand), but there are effectively only two settings: Off and On. When you specify On, you are really specifying On Demand because you must select the Developer Dashboard icon in the ribbon to cause it to appear. If you choose On Demand, you receive the same result.


Image Exam Tip

The Developer Dashboard is an indispensable tool for a SharePoint troubleshooter, especially because it can retrieve correlation IDs and their meaning from the back-end server. Understand how to enable this tool via Windows PowerShell and also how to activate, deactivate, and use this tool at a basic level.


Enabling Developer Dashboard using Windows PowerShell

To enable Developer Dashboard, you have to set a variable for the Developer Dashboard Settings object and then change its Properties:

$devdash=[Microsoft.SharePoint.Administration.SPWebService]::ContentService.
DeveloperDashboardSettings

$devdash.DisplayLevel = "On"

$devdash.Update()

To reverse the change, all you have to do is set the DisplayLevel property to a value of “Off” and then do another Update().

Activating Developer Dashboard from a SharePoint page

Once the Developer Dashboard is enabled, a new icon appears on the header to the right of Share, Follow, and Focus on Content links, as shown in Figure 7-26.

Image

FIGURE 7-26 The Developer Dashboard icon

When this icon is selected, the Developer Dashboard appears in a new browser window. Selecting any of the HTTP GET Requests displays the overall metrics required for the particular page to be rendered, as shown in Figure 7-27.

Image

FIGURE 7-27 Developer Dashboard, displaying the request and summary information

Analyze diagnostic logs

Each server in a SharePoint farm maintains a series of diagnostics logs known as ULS logs, which are contained by default in %ProgramFiles%Common FilesMicrosoft SharedWeb Server Extensions16Logs and are saved in 30-minute increments (default).


Important

ULS log files are in standard .txt format. ULS log files are named in the format servername-yyyymmdd-hhhh where the hour indicated (in 24-hour format) indicates the beginning time of the log.


These logs are not available to be viewed through Central Administration, but can be viewed by using the following:

Image A text editor such as Notepad

Image Windows PowerShell

Image Developer Dashboard

Image A third-party tool

ULS logging levels

There are six possible logging levels available to be reported within the trace log: none, unexpected, monitorable, high, medium, and verbose. Table 7-7 provides insight into what each level of logging entails.

Image

TABLE 7-7 ULS logging levels

Troubleshoot SharePoint hybrid cloud issues

The process required for understanding issues within a hybrid implementation is nearly identical to that required for an entirely on-premises solution. Diagnostic logging is configured using a series of counters that are used to capture metrics for both on-premises and Office 365 implementations of SharePoint.

A series of categories, shown in Table 7-8, can be monitored to gather a better understanding of issues within the SharePoint farm.

Image

TABLE 7-8 Monitoring Categories

Troubleshooting toolsets

Aside from the soon-to-be-released SharePoint Insights hybrid service (more on this in a bit), there are a series of tools that can be used to troubleshoot SharePoint installations, but these still primarily focus on onsite deployments, and can generally only provide basic monitoring (up or down) of the relationship between on-premises and online deployments of SharePoint. These tools are shown in Table 7-9.

Image

Need More Review?

For a more in-depth understanding of how to troubleshoot hybrid SharePoint environments, review the TechNet article “Troubleshooting hybrid environments” at https://technet.microsoft.com/library/dn518363.aspx.



Important

Help is on the way! The Microsoft Insights and Telemetry components are not present in the product (as of the publication date of this book), but the Microsoft Insights service is already in place, awaiting activation via cumulative update. Once this tool is in place, monitoring of both environments should become a much easier process. When this functionality is made available, the supporting documentation will be found in the TechNet article entitled “Microsoft SharePoint Insights” at https://technet.microsoft.com/library/86e0fc90-0ef8-4c22-9d3b-7af42bf882f1.


Summary

Image SLAs provide an understanding of performance between IT and the business, consisting of definitions for downtime, scheduled downtime, and monthly uptime percentages.

Image SharePoint Server 2016 farms can be monitored at the server, service application, site collection, and site levels.

Image Performance Monitor (PerfMon) captures component information about the member servers of the SharePoint farm, including information about the application itself as well as operating system metrics.

Image Data collection sets are used to gather groupings of metrics together for use in performance monitoring.

Image Data collection sets can be saved as templates for regular use.

Image Page performance monitoring includes counters for the ASP.NET output cache, the BLOB cache, and the page object cache.

Image Usage and Health Data Collection allows the SharePoint administrator an effective mechanism for gathering counters and other logging information from all farm member servers into a centralized database.

Image Three timer jobs are responsible for the collection and aggregation of logging data in a SharePoint 2016 farm: the Microsoft SharePoint Foundation Usage Data Import, Usage Data Maintenance, and Usage Data Processing jobs.

Image The Get-SPDatabase cmdlet is used to retrieve information about all databases within a SharePoint farm (configuration, content, and service application), whereas the Get-SPContentDatabase cmdlet is focused specifically on SharePoint content databases.

Image SharePoint-supported storage includes DAS and SAN. NAS is also supported, but only for RBS and only if the time to first byte in a response is less than 40 ms for more than 95 percent of the time.

Image Performance prioritization (from highest to lowest) from an SQL standpoint would include TempDB files and transaction logs, database transaction logs, Search databases, then database data files.

Image Content databases should never be configured to have a maximum size in SSMS.

Image Health Analyzer rules handle routine database maintenance such as correcting fragmented indices and outdated index statistics.

Image SQLIO, IOMeter, and SQLIOSim are tools that can be used to accurately gauge the performance potential of a disk subsystem.

Image The page output cache can be enabled at the web application, site collection, site, or page layout levels.

Image The BLOB cache stores large content on the local drive of a Front-end server.

Image The object cache stores content in the memory of the Front-end server, reducing the amount of memory available to the rest of the system.

Image Zero Downtime Patching requires resiliency at all server roles within a MinRole farm.

Image Monitoring in SharePoint for the purposes of baselines can include metrics from diagnostic logging, the rescheduling of timer jobs, the addition of new diagnostic providers, and usage and health data collection.

Image Server-side tracing is done within the ULS logs.

Image The Developer Dashboard must be enabled for use within the farm using PowerShell.

Thought experiment

In this thought experiment, demonstrate your skills and knowledge of the topics covered in this chapter. You can find the answer to this thought experiment in the next section.

You are in the process of completing your SharePoint farm implementation, and want to release it in a limited fashion to an early group of users.

1. Are you providing a prerelease SLA to these users? What SLA considerations might you document?

2. What steps might you take to ensure that monitoring can be data mined for further analysis?

3. From a troubleshooting standpoint, what tools do you have at the ready to evaluate a server (or servers) that are causing issues?

Thought experiment answer

1. An SLA (even a preliminary one) is never a bad idea, as it sets the stage for an understanding between IT, the SharePoint administration team, and the business users that are participating in the early adopter program. Perhaps a minimal SLA of one nine (95 percent) would be an appropriate starting place for the farm.

2. Setting up the requirements for the Usage and Health Data Collection would provide a mechanism for capturing baseline information as well as any performance eccentricities with respect to the SharePoint 2016 environment.

3. Although the Usage and Health Data Collection functionality is useful, it captures a lot of data that must be parsed to be useful. From a troubleshooting standpoint, a combination of data collector sets in PerfMon and logging events in the ULS logs will allow the administration team to time-box any issues, retrieving only the logging information required to ad hoc troubleshoot a particular event.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset