Performance Management 39th Edition

From wiki.gis.com
Jump to: navigation, search
System Design Strategies (select here for table of contents)
System Design Strategies 39th Edition (Fall 2016)
1. System Design Process 39th Edition 2. GIS Software Technology 39th Edition 3. Software Performance 39th Edition 4. Server Software Performance 39th Edition
5. GIS Data Administration 39th Edition 6. Network Communications 39th Edition 7. GIS Product Architecture 39th Edition 8. Platform Performance 39th Edition
9. Information Security 39th Edition 10. Performance Management 39th Edition 12. City of Rome 39th Edition 11. System Implementation 39th Edition
A1. Capacity Planning Tool 39th Edition B1. Windows Memory Management 39th Edition Preface (Executive Summary) 39th Edition SDSwiki What's New 39th Edition


Fall 2016 Performance Management 39th Edition

Esri has implemented distributed GIS solutions since the late 1980s. For many years, distributed processing environments were not well understood, and customers relied on the experience of technical experts to identify hardware requirements to support their implementation needs. Each technical expert had a different perspective on what hardware infrastructure might be required for a successful implementation, and recommendations were not consistent. Many hardware decisions were made based on the size of the project budget, rather than a clear understanding of user requirements and the appropriate hardware technology. Many GIS implementation projects would fail due to poor system design and lack of performance management.

Esri started developing simple system performance models in the early 1990s to document our understanding about distributed processing systems. These system performance models have been used by Esri system design consultants to support distributed computing hardware solutions since 1992. These same performance models have also been used to identify potential performance problems with existing computing environments.

The Capacity Planning Tool was introduced in 2008 incorporating the best of the traditional client/server and web services sizing models providing an adaptive sizing methodology to support future enterprise GIS operations. The new capacity planning methodology is much easier to use and provides metrics to manage performance compliance during development, initial implementation, and system delivery.

This chapter introduces how these design models can be used for performance management.

System performance factors

Figure 10.1 Several key system performance factors work together to provide required user workflow productivity. A properly balanced resource investment will provide the optimum user performance.

Figure 10.1 identifies some key components that contribute to overall system performance. Software technology selection and application design drives the processing loads and network traffic requirements. Hardware and architecture selection establishing processing capabilities and how the processing loads are distributed. Network connectivity establishes infrastructure capacity for handling the required traffic loads.

Warning: Weakest system component determines overall system performance (performance chain).
Best practice: Balanced system design provides optimum user performance at lowest system cost.

Software technology factors

Software design efficiency and level of analysis establishes complexity of the application functions. Data source structure and the size and composition of the data contributes to the complexity of the information the application must work with.

Application:

  • Core software and client application efficiency.
  • Display complexity includes layers per display, features per display extent, functions used to complete the display, and display design for each map scale.
  • Display traffic
  • User workflow activity including user productivity, implementation of heavy workflow tasks, and workflow efficiency (mouse clicks to final display, communication chatter)

Data source:

  • Data source technology including DBMS (data types, indexing, tuning, scalability), file source (File format, structure, indexing, scalability), imagery (Image format, file size, indexing, pre-processing, on-the-fly processing), or cached data source.
  • Geodatabase design including table structure, dependencies, and relationship classes.
  • Data connection including SDE (direct connect, applications server connect) or file source (internal disk, direct attached, network attached).

Hardware technology factors

Hardware design and performance characteristics determine how fast the servers can do work and the volume of work they can handle at one time.

  • Workstation/application server/GIS server including processor core performance, platform capacity (servers), physical memory, network connection, graphics processing unit.
  • Data server including processor core performance, platform capacity, physical memory, and network connection.
  • Network communications including bandwidth, traffic, latency, and application communication chatter.

The system design solution must provide sufficient platform and network capacity to process software loads within peak user performance needs.

Best practice: CPT Standard Workflows provide proper processing load profile.


How is performance managed?

System architecture design provides a framework for identifying a balanced system design and establishing reasonable software processing performance budgets. Performance expectations are established based on selected software processing complexity and vendor published hardware processing capacity. System design performance expectations can be represented by established software processing performance targets. These performance targets can be translated into specific software performance milestones which can be validated during system deployment. Software processing complexity and/or hardware processing capacity can be reviewed and adjusted as necessary at each deployment milestone to ensure system is delivered within the established performance budget.

Our understanding of GIS processing complexity and how this workload is supported by vendor platform technology is based on more than 20 years of experience. A balanced software and hardware investment, with capacity based on projected peak user workflow loads, can reduce cost and ensure system deployment success.

Figure 10.2 Performance management involves building a design solution based on appropriate workflow performance targets and managing compliance throughout design and implementation to deliver within those targets.

Most project managers clearly understand the importance and value of a project schedule in managing deployment risk associated with cost and schedule. The same basic project management principals can be applied to managing system performance risk. Figure 10.2 shows some basic concepts that can be used in managing performance.

System architecture design framework:

  • CPT provides balanced standard and custom workflow load profiles.
  • Workflow complexity assessment is used to assign reasonable software processing performance budgets.

Workflow complexity assessment:

  • Light complexity represents simple user displays with minimum functional analysis (light processing loads).
  • Medium complexity represents standard workflow performance targets that satisfy most workflows that apply best practice design standards. Medium complexity is roughly twice light complexity processing loads.
  • Heavy complexity represents workflows that include more complex map displays or data models that generate 50 percent more processing than medium complexity workflows.
  • Additional complexity selections (2x medium, 3x medium, 4x medium, …10x medium) are available for establishing much heavier performance targets.

Workflow complexity guidelines:

  • Light complexity is the minimum loads expected based on software technology selection.
  • Medium complexity would support up to 80 percent of selected software technology deployments.
  • Heavy complexity represents user workflows with more complex data models (more layers, more features per layer, and more complex analysis).
  • 2x medium, 3x medium, 4x medium, …10x medium represent much more complex workflow loads that are possible with expanding technology and emerging display details.

Faster hardware processing allows more complex analysis to be included in the user workflows. These heavier complexity workflows (2x, 3x, 4x, …10x medium) may not handle a large number of concurrent users, but with today's technology they can deliver map display results in a reasonably response time (less than 5-sec).

Best practice: Performance expectations are established based on selected software processing complexity and vendor published hardware processing speed (per core performance).

Workflow performance targets

Computer platforms must be configured properly to support system performance requirements. There are many factors that contribute to user performance and productivity. Enterprise GIS solutions include distributed processing environments where user performance can be the product of contributions from several hardware platform environments. Many of these platform resources are shared with other users. Understanding workflow distributed processing loads provides a fundamental framework for deploying a successful enterprise GIS.

Fundamental framework for performance management:

  • System design performance expectations can be represented by established software processing performance targets.
  • These performance targets can be translated into specific performance validation milestones.
  • Performance can be validated at established software development and system deployment milestones.
  • Software processing complexity and/or hardware processing capacity can be adjusted at each milestone to deliver within established performance budget.
Best practice: Esri understanding of GIS processing complexity and how this workload is supported by vendor platform technology is based on more than 20 years of experience. A balanced software and hardware investment, with capacity based on projected peak user workflow loads, can reduce cost and ensure system deployment success.


ArcGIS for Desktop Wkstn workflows

Figure 10.3 shows the primary ArcGIS for Desktop Wkstn workflow patterns.

Figure 10.3 shows the standard ArcGIS for Desktop workflow patterns with the desktop application installed on the client workstation. These workflows include direct connect to data sources, feature service connections, and connection to imagery data sources.

Custom loads for each of these workflow patterns can be generate on the CPT Calculator tab.

Standard workflows are available on the CPT Workflow tab. Standard workflows provide medium and heavy complexity performance targets for the most common workflow patterns.

ArcGIS for Desktop Citrix workflows

Figure 10.4 shows the primary ArcGIS for Desktop Citrix workflow patterns.

Figure 10.4 shows the standard ArcGIS for Desktop workflow patterns with the desktop application installed on a central host (Citrix) platform tier. Citrix is the most common desktop hosting services software used by ArcGIS clients. Other vendors (Microsoft, VMware, and Amazon) provide competitive solutions that have similar performance characteristics. These workflows include direct connect to data sources, feature service connections, and connection to imagery data sources.

Custom loads for each of these workflow patterns can be generate on the CPT Calculator tab.

Standard workflows are available on the CPT Workflow tab. Standard workflows include medium and heavy complexity performance targets for the most common workflow patterns.

ArcGIS for Server workflows

Figure 10.5 shows the primary ArcGIS for Server workflow patterns.

Figure 10.5 shows the standard ArcGIS for Server workflow patterns. These workflows include ArcGIS Server Web service profiles supporting a variety of map outputs, feature, and Image service use cases.

Custom loads for each of these workflow patterns can be generate on the CPT Calculator tab.

Standard workflows are available on the CPT Workflow tab. Standard workflows include medium and heavy complexity performance targets for the most common workflow patterns.

Web service peak throughput

Figure 10.6 shows best practice for identifying Web services peak throughput.

Figure 10.6 shows a typical measured Web service throughput distribution. Web service loads vary throughout the day and month, depending on service use patterns.

System hardware and network infrastructure must be designed to support the peak workflow loads. For Web services, this peak load is identified as peak concurrent users (typical for internal Web services supporting a defined user environment) or peak hour transaction rate (typical for public Web services based on expected service popularity).

What is a valid user workflow?

Figure 10.7 A valid workflow provide sufficient time for user to review the display and enter the following display request.

Figure 10.7 shows a valid workflow. All user workflow performance terms work together during each display transaction to satisfy business performance requirements.

Workflow specifications:

  • User productivity = 10 DPM/client (user workflow performance needs)
  • Display cycle time = 6 sec (60 seconds in a minute divided by 10)

For a given display executed on a given platform:

  • Display service time is a constant value.
  • In a shared server environment, queue time increases with increasing user loads (increasing server utilization).
  • As queue time increases, display response time increases.
  • For a fixed user productivity (10 displays per minute), computed user think time will decrease with increasing display response time.

Computed user think time is greater than minimum think time for valid workflow.

Warning: At some point, computed user think time will be less than minimum think time (invalid user workflow).


User productivity adjustment

Figure 10.8 The CPT Design identifies an invalid workflow when computed think time is less than minimum think time. The CPT Adjust function reduces user productivity value until computed think time = minimum think time. Workflow is valid once computed time is equal to or greater than minimum think time.

During peak system loads, queue time can increase to a point where computed think time is less than minimum think time as shown in Figure 10.8. The user productivity must be adjusted (reduced) to represent a valid user productivity.

CPT identifies an invalid workflow by changing the workflow productivity.

  • Workflow productivity must be reduced to identify a valid workflow.
  • CPT includes a RESET ADJUST function that will automatically reduce workflow productivity to the proper reduced value.

CPT Design ADJUST function:

  • Valid system solution is reached when computed user think time is equal to or greater than minimum think time for all workflows.
  • Valid solution is identified on the CPT display once valid workflow is established.

CPT Design ADJUST process:

  • Iterative calculation that reduces user productivity for all invalid workflows and then re-computes the system solution.
  • If adjusted productivity provides minimum think time less than computed think time, the next iteration will increase productivity slightly and re-compute the system solution.
  • Iterations continue until the most critical adjusted computed think time = minimum think time.
Best practice: Enable iterative calculations in Excel Options > Formula.
  • Maximum Iterations: 500
  • Maximum Change: 0.001
Warning: Excel will provide a Circular Reference Warning if the Enable iterative calculations is not selected. Iterative calculations are required for many of the CPT sizing calculations.
CPT workflow productivity adjustment
Best practice: System design should be upgraded to satisfy user productivity needs.


==Geoprocessing services (batch workflows)

A batch process is a workflow that does not require user interaction. User inputs are provided before the process is executed. The process then runs without user input until the job is done. Figure 10.9 shows a diagram representing a batch process.

Figure 10.9 Batch process loads are sequential in nature and productivity depends on computed response time.

Most heavy GIS functions can be modeled as a batch process. GIS heavy batch processes, when deployed on Server, are often called geoprocessing services.

Geoprocessing functions can be deployed as a network service configured to handle multiple user service work requests.

  • Geoprocessing function runs as a sequential batch process.
  • Each concurrent geoprocessing instance consumes a single platform core.

Advantages of configuring geoprocessing functions as a network service:

  • Service work request is sent to a processing queue to await execution.
  • Specific number of server cores can be allocated to execute the service.
  • User can do other work while waiting for the work request to be serviced.

Batch process loads are modeled as a workflow with zero (0) think time (no user input between display transactions).

  • Batch productivity is calculated based on computed response time (60 seconds/response time = batch DPM).
  • Batch process queue time is limited to service contention (no random arrival queue time).
  • Displays are requested sequentially following each refresh.
  • Batch processes deployed on a single platform with local data source tend to consume a single processor core.
  • CPT Design tab will distribute loads across available cores resources based on batch workflow profile (limiting system component will determine peak batch productivity).

Batch processing examples:

  • Map caching
  • Enterprise Geodatabase reconcile and post
  • Geodatabase replication
  • Heavy map printing jobs
  • Heavy routing analysis
  • Heavy imagery processing
  • Heavy geospatial analysis
  • Heavy network analysis
Best practice: Any heavy system-level geoprocessing function that may be requested by more than one user at a time should be separated from the user application workflows and executed as separate network batch process work request services.
Warning: CPT Design productivity adjust function must be used to computer system loads and batch process productivity. Each concurrent batch process are identified in the CPT Design as a user (column C) or client (column D) instance.
CPT representation of batch processing loads
Best practice: Workflow selection should have same load profile (client, web, GIS server, SDE, DBMS) as the batch process you wish to model. Total processing time is not important for modeling load profile.

The batch process productivity must be computed to identify a valid workflow. Productivity will depend on the server loads and available system resources. A single batch process can take advantage of only one processor core.

Best practice: Recommended design practice - any heavy function (runs more than 30 seconds) that might be requested by several users at a time should be configured as a batch process (network services). Processing queue must be established for user work request input. Each batch instance (network service) will process requests sequentially based on available processor resources. User can be notified once their work request is services.

Platform throughput and service time

Figure 10.10 provides a chart showing the relationship between utilization and througput; a simple relationship that can be used to identify platform capacity.

The most important system performance terms define the average work transaction (display), work throughput, system capacity, and system utilization. Figure 10.10 provides a chart showing the relationship between utilization and throughput; a simple relationship that can be used to identify platform capacity.

Capacity (DPM) = Throughput (DPM)/Utilization

Best practice: If you know the current throughput (users working on the system) and you measure the system utilization (average computer CPU utilization), then you can know the capacity of the server.

The relationship between throughput, capacity, and utilization are true based on how these terms are defined.

  • Throughput is the number of work transactions being processed per unit time.
  • Capacity is the maximum throughput that can be supported by a specific hardware configuration.
  • Utilization is the ratio of the current throughput to the system capacity (expressed as percentage of capacity).

The processor core is the hardware that executes the computer program instructions.

  • Number of processor core identifies how many instances can be serviced at the same time.
  • Service time is a measure of the average work transaction processing time.

Work transaction service time is a key term used to measure software performance.

  • The software program provides a set of instructions that must be executed by the computer to complete a work transaction.
  • The processor core executes the instructions defined in the computer program to complete the work transaction.

Transactions with more instructions represent more work for the computer, while transactions with fewer instructions represent less work for the computer.

The complexity of the computer program workflow can be defined by the amount of work (or processing time) required to complete an average work transaction.

  • Service time on the CPT Workflow tab is presented relative to a platform performance baseline.
  • Faster platform processor cores execute program instructions in less time than slower processor cores.
  • Service time can be computed using a simple formula based on number of processor cores and platform capacity.

Service time (sec) = 60 sec x #core/Capacity (DPM)

Service time can be computed based on measured throughput and utilization.

Figure 10.11 Service time calculations for peak loads generated at each web service instance configuration.

Figure 10.11 shows service time results for five different throughput loads.

  • Number of deployed service instances determine peak loads.
  • Throughput and utilization are measured for each of the five separate test configurations.
  • Capacity of 714 DPM was calculated from each test load.
  • Service time of 0.34 sec was calculated from each test load.
Best practice: You can calculate capacity from throughput and utilization measurements at any system load.
Note: Real operational environments can provide a very good measure of capacity.

Once you know the platform capacity, you can compute the platform service time.

Display response time

Figure 10.12 Display response time increases with increased platform loads.

Figure 10.12 shows the relationship between response time, service time, and queue time. Response time is the total time required to refresh the client display. This time include the system processing times (service times) and any system wait times (queue time).

  • Service time is the total processing time to complete the display, and depends on the complexity of the data and any analysis functions required to display the final information product on the client machine.
  • Queue time depends on service contention (multiple requests for service on limited processing resources). Service contention can occur during server processing loads or traffic contention across the network. Queue time also includes sequential packet travel times (latency) when delivering messages and data across the network.
Best practice: Response time measurements during light system loads on a local platform (not across a network) can be used to estimate service time.

Platform performance and response time

Figure 10.13 Display response time increases with increased platform loads.

Figure 10.13 provides a chart showing the relationship between utilization and response time.

You can calculate display service time if you know the platform throughput and corresponding utilization, calculated at any throughput level. Calculating user display response time for shared system loads is a little bit more difficult.

Calculating user response time:

  • Only one user transaction can be serviced at a time on each processor core.
  • If many user transaction requests arrive at the same time, some of the transactions must wait in line while the others are processed first.
  • Waiting in line for processing contributes to system processing delays.
  • User display response time must include time for all the system component processing times and system delays, since the display is not complete until the final processing is done.

Any system time where a transaction request must wait in line for processing is called queue time.

Response time is the sum of the total service times (processing times) and queue times (wait times) as the transaction request travels across system components to the server and returns to deliver the final user display.

Response time (sec) = Service time (sec) + Queue time (sec)

Warning: Queue time increases to infinity as any processing component of the system approaches full capacity.

Response time is importance, since it directly contributes to user productivity.

Productivity = 60 sec/(response time + think time)

Warning: As queue time increases response time will increase and productivity will decrease.

Platform queue time

Computing response time is a common problem for many business applications. To get it right, you have to understand queue time. The theory of queues or waiting in line has its origin in the work of A. K. Erlang, starting in 1909.

Figure 10.14 Transaction request queue time will vary with platform utilization and number of platform core.

Figure 10.14 shows a formula for queue time and also a graph showing the relationship between queue time and platform utilization. The number of platform processor core determines the sensitivity of queue time to platform utilization.

The simplest queuing models work for large populations of random arrival transactions, which should certainly be the case when modeling computer computations (thousands of random computer program instructions being executed within a relatively small period of time—e.g., seconds).

The queue time calculations used in the Capacity Planning Tool is a simplified model developed from Operations Research Queuing theory.

  • The second half of the model (single core section) is quite straight forward, and there is general agreement that this simple model would identify wait times in the case of a single service provider (single core platform or single network connection).
  • The multi-core case is a little more complicated, and unfortunately is the more common capacity planning calculations we need to deal with in multi-core server platform configurations.

Queue time model

The single-core platform queue time increases with increasing service time and platform utilization.

Queue time (single-core) = service time (sec) x utilization/(1 - utilization).

Queue time is zero (0) when utilization is zero (0) and increases to infinity as utilization approaches 100 percent.

In the multi-core platform case, it is important to include the probability of a processor core being available to service the request on arrival(not busy).

  • The more processor cores in the server, the more likely one of these cores will be available for processing when the service transaction arrives.
  • The equation simplifies to the simple single-core formula when the number of processor cores = 1.

Multi-core availability = 1/{1 + utilization x (cores - 1)}

Queue time = Multi-core availability x Queue time (single-core)

The derived queue time formula provided above has been compared against several benchmark test results, and the computed response time was reasonably close to the measure test results (showed conservative response times—slightly higher than measured values).

It is important to recognize that the accuracy of the queue time calculation impacts only the expected user response time, and does not reduce the accuracy of the platform capacity calculations provided by the earlier simple relationships.

  • For many years, Esri capacity planning models did not include estimates for user response time.
  • Workflow response time is important, since it directly impacts user productivity and workflow validity.
  • If display response times are too slow, the peak throughput estimates would not be achieved and the capacity estimates would be not be conservative.
Best practice: Including user response time in the capacity planning models provides more accurate and conservative platform specifications, and gives customers with a better understanding of user performance and productivity.

Queue time derivatives

Peak system loads with display response time = 2 seconds

Multi-core servers provide better quality of service than single-core servers during heavy loads.

  • Eight 1-core servers provide throughput of 14,400 TPH with two-second response time.
  • Four 2-core servers provide throughput of 17,856 TPH with two-second response time.
  • Two 4-core servers provide throughput of 22,176 TPH with two-second response time.
  • One 8-core server provides throughput of 25,344 TPH with two-second response time.
Warning: More cores per server improves throughput only when display service times are the same for all configurations.
CPT Design multi-core platform performance demonstration

How to size the network

Figure 10.15 Display response time increases with increased network loads.

Figure 10.15 provides a chart showing the relationship between network utilization and response time. Performance models used to support network communications follow the same type of terms and relationships identified for server platforms.

Some of the same performance terms are referenced by different names.

  • Network transaction = display
  • Network throughput = traffic
  • Network capacity = bandwidth
  • Network utilization = utilization

The network connection (switch port, router port, network interface card, hardware bus adapter, etc.) is the hardware that processes the network traffic.

  • Most local networks are identified as single path systems.
  • Multiple NIC cards or multiple network paths can improve throughput utilization.

Additional performance terms:

  • Network service time = network transport time
  • Network queue time = network congestion delays
  • Network latency delay time = measured latency (round trip travel time) x chatter (round trips)
Best practice: CPT includes network as additional system component when computing system performance.
Warning: Network performance can be the most critical design constraint for many distributed system design solutions.

What is system performance?

Figure 10.16 System performance must consider service time and queue time contributions for components across the distributed system environment.

Figure 10.16 shows the information provided by the CPT Workflow Performance Summary. Workflow service times and queue times are shown in a stacked bar chart. Response time, shown at the height of the stack, is the total time required to complete the work transaction.

The Workflow Performance Summary chart shows the performance of 10 separate benchmark tests.

  • Test were performed on 2-core servers.
  • Number of concurrent batch processes was increased with each test run.
  • First two tests (1 and 2 batch processes) response time was about the same.
  • Response time increased linearly for tests with more than 2 batch processes.

Response time includes all of the processing times and queue times experienced in completing an average work transaction.

  • Platform service and queue times
  • Network transport and queue times
  • Latency travel time delays
  • Client service time

Server deployment transaction throughput capacity constraints

Several technology factors impact performance and scalability of deployed server systems. Selecting the optimum configuration strategy will help ensure peak system throughput and optimum return on investment. The following technology factors are important in developing an optimum ArcGIS deployment solution.

ArcGIS for Server Site processing overhead

Figure 10.17 ArcGIS for Server Site communication overhead will vary with transaction throughput and number of GIS Server machines.

ArcGIS for Server provides a cluster aware capability that supports deployment of multiple clustered server machines within a single ArcGIS for Server site. Each GIS Server machine communicates with a common configuration store and with each machine in the site for load balancing and service transaction assignment. ArcGIS for Server cluster aware processing overhead varies based on the transaction throughput rate and the number of machines in the ArcGIS for Server site.

Figure 10.17 shows the peak transaction throughput based on number of machines in the ArcGIS for Server site. Five services with different levels of complexity are shown in the graphic. The results demonstrate that very light services with very high transaction rates do not scale out well due to the multiple machine site communication overhead.

CPT Design demonstration of ArcGIS for Server Site scalability

ArcGIS 10.3.1 for Server provides an option to remove internal load balancing for GIS Server site machines for a more scalable configuration. This siloed deployment option (internal site communications are disabled) is limited to single cluster sites and provides linear scalability.

CPT Design demonstration of ArcGIS for Server single-cluster site scalability

Virtual Server consolidation

Figure 10.18 ArcGIS for Server deployed in a physical server architecture.

Figure 10.18 shows a typical Enterprise GIS production environment supported by a physical server architecture.

For many years, data centers were supported by physical server configurations. With physical server deployment

  • Many servers were required to support Enterprise operations.
  • Many servers were performing well below their optimum capacity.
  • High number of servers contributed to data center high power consumption.
Figure 10.19 ArcGIS for Server deployed in a physical server architecture.

Figure 10.19 shows a typical Enterprise GIS production environment supported by a virtual server architecture. Virtualization reduces the total number of data center physical servers.

Virtual server machines are deployed on host server platforms.

  • Multiple virtual machines can be supported by a single host server configuration.
  • Host platforms can run at optimum capacity levels (50 percent to 80 percent utilization).
  • Virtual Server architecture can be deployed to optimize host platform processing loads.

Virtualization: Host server processing loads

Figure 10.20 Virtual Server host machine hypervisor processing overhead.

Figure 10.20 shows the hypervisor processing loads supported on the host platform and the impact on virtual server utilization.

Virtual Server machines (VM) are deployed on a host platform, with access to processing resources controlled by a hypervisor. The hypervisor assigns VM virtual core to host platform hardware CPU resources, allocating available processing resources between the deployed VMs.

Hypervisor processing loads are supported directly by the host platform and can be serviced by available host CPU resources separate from the CPU resources assigned to Virtual Server machines (if extra CPU resources are available). When host platform CPU resources are limited, the hypervisor must compete with the VM core for access to available host platform resources.

Test results show hypervisor loads may account for up to 50 percent of the total virtual server processing loads. Virtual core for each VM must be assigned to available host platform physical core for processing. Optimum VM throughput is achieved when sufficient host resources are available to support all VM processing requests along with the hypervisor processing load without having to compete for processing resources. As host platform utilization approaches 100 percent, the VM utilization will be limited based on available host resources.

CPT Design demonstration of ArcGIS for Server Virtual Machine (VM) performance.

Available Virtual Server machine utilization and throughput is limited by hypervisor processing overhead when virtual servers must compete with available host platform processing resources.

Best practice: Provide host platform with 50 percent more processing capacity that required by the virtual servers.

Esri/VMware joint benchmark testing reports.

Test results show significant virtual server performance improvements with the more recent VMware vSphere technology. The October 2011 testing showed slightly more than 10 percent virtual server processing overhead per core, while the July 2013 testing showed limited performance degradation between physical and virtual server deployment configurations when the virtual host platform performs at levels less than 90 percent utilization.

Note: July 2013 testing showed virtual server hypervisor overhead of 30 percent running on the host platform (50 percent of the VM loads).


Performance Validation

Planning provides the first opportunity for building successful GIS operations. Getting started right, understanding your business needs, understanding how to translate business needs to network and platform loads, and establishing a system design that will satisfy peak user workflow requirements is the first step on your road to success.

Planning is an important first step – but it is not enough to ensure success. If you want to deliver a project within the initial planning budget, you need to identify opportunities along the way to measure progress toward your implementation goal. Compliance with performance goals should be tracked throughout initial development, integration, and deployment - integrate performance validation measurements along the way. Project success is achieved by tracking step by step progress toward your implementation goal, making appropriate adjustments along the way to deliver the final system within the planned project budget. The goal is to identify problems and provide solutions along the way - the earlier you identify a problem the easier it will be to fix. System performance can be managed like any other project task. We showed how to address software performance in Chapter 3, network performance in Chapter 5, and platform performance in Chapter 7. If you don’t measure your progress as these pieces come together, you will miss the opportunity to identify and make the appropriate adjustments needed to ensure success.

Figure 10.21 Manage performance by sharing display complexity targets and measuring compliance throughout system deployment.

Figure 10.21 shows three key opportunities for measuring performance compliance. When possible it is important to take advantage of opportunities throughout system development and deployment where you can measure progress toward meeting your performance goals. The CPT Test tab includes four tools you can use to translate live performance measurements to workflow service times – the workflow performance targets used to define your initial system design.

Map display render times

In Chapter 3 we shared the important factors that impact software performance. For Web mapping workflows, map complexity is the primary performance driver. Heavy map displays (lots of dynamic map layers and features included in each map extent) contribute to heavy server processing loads and network traffic. Simple maps generate lighter server loads and provided users with much quicker display performance. The first opportunity for building high performance map services is when you are authoring the map display.

There are two map rendering tools available on the CPT Test tab that use measured map rendering time to estimate equivalent workflow service times. One tool is available for translating ArcGIS for Desktop map rendering times (MXD) and the other tool is for translating ArcGIS for Server map service rendering times (MSD). With both tools, measured map rendering time is translated to workflow services times that can be used by the CPT Calculator and Design tabs for generating your platform solution. The idea is to validate that your map service will perform within your planned system budget by comparing the workflow service times generated from your measured rendering times with your initial workflow performance targets. If the service times exceed your planned budget, you should either adjust the map display complexity to perform within the initial planning budget or increase your system performance budget. The best time to make the map display complexity adjustment is during the map authoring process. Impacts on the project budget can be evaluated and proper adjustments made to ensure delivery success.

Map publishing preview render times

Measured MSD render time

MSD render time can be measured when publishing your map service using the service editor preview tool.

Warning: Make sure to measure a map location that represents the average map complexity or higher within your service area extent.

MXDPerfStat render times

Measured MXD render time

MXD render time can be measured using the [MXDperfstat] ArcScript performance measurement tool.

Warning: Make sure to measure a map location that represents the average map complexity or higher within your service area extent.

System test measured throughput and platform utilization

Measured throughput and platform utilization

If you know your platform configuration, your measured peak workflow throughput, and the associated platform utilization the CPT can calculate the workflow service times. The Test tab translation tools can be used to input throughput (transaction per hour), the platform configuration (server platform selection), and the measured platform utilization and excel will translate these inputs to equivalent workflow service times.

Best practice: Performance metrics can be collected from benchmark test or live operations.
Warning: Make sure all measurements are collected for the same loads at the same time.

System monitor concurrent users and platform utilization

Measured peak concurrent users and platform utilization translator

If you don’t have measured throughput, concurrent users working on the system can be used to estimate throughput loads. This is a valuable tool for using real business activity to validate system capacity (business units identify peak user loads and IT staff identify server utilization observed during these loads). The Test tab can be used to input throughput (peak concurrent users), the platform configuration (server platform selection), and the measured platform utilization and excel will translate these inputs to equivalent workflow service times.

Best practice: Analysis assumes peak users are working at web power user productivity (6 DPM) over a reasonable measurement period (10 minutes).
Warning: Make sure all measurements are collected for the same loads at the same time.
Move Test tab derived workflow service times to project workflows.

The CPT Workflow tab is where the results of your performance validation efforts come together. You can bring all your test results together, along with the original workflow service times, to validate that you are building a system that will perform and scale within your established project performance budget.

Best practice: Performance management, including performance validation throughout development and system delivery, is the key to implementation success. It is important that you identify the right technology and establish reasonable performance goals during your initial system design planning. It is even more important that you monitor progress in meeting these goals throughout final system development and delivery.

Capacity Planning

The models supporting Esri capacity planning today are based on the performance fundamentals introduced in this section. Platform capacity is determined by the software processing time (platform service time) and the number of platform core, and is expressed in terms of peak displays per minute. Platform capacity (DPM) can be translated to supported concurrent users by dividing by the user productivity (DPM/client).

The performance fundamentals discussed in this chapter are basic concepts that apply to any computer environment, and an understanding of these fundamentals can establish a solid foundation for understanding system performance and scalability. Software and hardware technology will continue to change, and the terms and relationships identified in this section can be used to normalize these changes and help us understand what is required to support our system performance needs.

The next chapter will provide an overview of the Capacity Planning tools introduced throughout the previous chapters. The CPT videos at the end of this chapter focus on system performance validation – showing how the fundamental performance terms and relationships are used by the CPT to connect user requirements with system hardware loads, and how these loads are used to identify appropriate hardware requirements. Performance validation during system design and deployment is also a key topic, sharing how the CPT Test tools can be used to translate real performance measurements to equivalent workflow service times for performance validation.

CPT Capacity Planning videos

Previous Editions

Performance Management 38th Edition
Performance Management 37th Edition
Performance Management 36th Edition
Performance Management 35th Edition
Performance Management 34th Edition
Performance Management 33rd Edition
Performance Management 32nd Edition
Performance Management 31st Edition
Performance Fundamentals 30th Edition
Performance Fundamentals 29th Edition
Performance Fundamentals 28th Edition
Performance Fundamentals 27th Edition

System Design Strategies (select here for table of contents)
System Design Strategies 39th Edition (Fall 2016)
1. System Design Process 39th Edition 2. GIS Software Technology 39th Edition 3. Software Performance 39th Edition 4. Server Software Performance 39th Edition
5. GIS Data Administration 39th Edition 6. Network Communications 39th Edition 7. GIS Product Architecture 39th Edition 8. Platform Performance 39th Edition
9. Information Security 39th Edition 10. Performance Management 39th Edition 12. City of Rome 39th Edition 11. System Implementation 39th Edition
A1. Capacity Planning Tool 39th Edition B1. Windows Memory Management 39th Edition Preface (Executive Summary) 39th Edition SDSwiki What's New 39th Edition

Page Footer
Specific license terms for this content
System Design Strategies 26th edition - An Esri ® Technical Reference Document • 2009 (final PDF release)