SAPADMIN and Amazon Web Services

SAP has certified the Amazon Web Services cloud as a suitable platform for running production instances of some products. The Amazon cloud is probably the most well known of the Infrastructure as a Service cloud vendors. Before making any sizing decisons or or decisons regarding using AWS for SAP systems, please check the latest version of the Operating SAP Solutions on AWS White Paper (PDF). This details the special considerations for SAP Systems on AWS, including some Operating System restrictions.

However, there are some other caveats and gotchas that you need to be aware of before putting any system (SAP or otherwise – even your Development, Testing or QA instances, let alone Production instances) in any cloud environment. It is sometimes tempting, even at a very high-level, to think of cloud based infrastructure as a form of what used to be called remote computing, where the datacenter is located some distance from the users, administrators and developers, just much cheaper to use and much quicker to provision. For most parts of an SAP implementation, this does hold true; users connect via NWBC, a browser or the
SAP GUI to a DNS name, and manipulate the information they find – they add to it, update it, share it, regardless of where it’s stored and the computer(s) used to perform the work.

However, this does avoid a key concept of Cloud computing which the idea of commodity virtualisation of everything. So, bearing this in mind, let’s explore some important lessons about Cloud Computing.

Lesson 0: Only the paranoid survive

Andrew Grove was chairman of Intel when he published a business book called ‘Only the Paranoid Survive’. It sounds like an awfully cold way to deal with business colleagues, but when it comes to down to me and the computers, it has been a useful one.

Lesson 1: SLAs Are Meaningless

You can’t compare any kind of hosting services based on their advertised SLAs. Instead, base your comparisons on their response to you and your company’s issues. Regardless of what they say, ‘stuff’ will happen. Yes, Amazon has a service level agreement for EC2 of 99.95% uptime, averaged over the last year. You would imagine that this was set (by Amazon) based on historical information. However, as they say in the financial pages “historical behaviour is not an indicator of future performance”. And when ‘stuff’ happens, where are you in the queue, for personal attention, recompense, or even just a communication of some sort ?

By the way, due mainly to the recent outage, EC2′s uptime over the last year is around 99.5%.

Lesson 2: YOUR Architecture CAN save You from Cloud Failures, but …

Disaster Recovery processes have two major SLAs; the Recovery Time Objective, which is a duration of time (an SLA, really) within which a business process must be restored after a disaster (or disruption), and the Recovery Point Objective which describes the acceptable amount of data loss measured in time. By the way, the O stands for Objective, not Agreement or Mandate (see Lesson 1).

This means that if an instance becomes unavailable to the business, they want a working system within the RPO time, with data loss of less than the RTO. This requires the same thinking and planning that goes into Disaster Recovery planning for an in house system. In turn, this means managing and planning for Disaster Recovery and Data Security, and allowing for the typical requirements of a Disaster Recovery Plan, except with a Cloud twist to them…

  • You still need to choose the right infrastructure,
    i.e. Does your vendor have seperate physical locations ?
  • You need to manage your view of the infrastructure,
    i.e. How easy is it to transfer backups from one physical location to another ?
  • You still need to test the transfer of backup data,
  • You still need to test the restore / restart of your system in the alternate location,
  • Your vendor may provide alternate physical locations,
    but do you have / need an alternate provider ?
  • and so on.

Lesson 3: There is a BIG difference between virtual machines and the hardware.

Things get a little more difficult at the micro level. Fault-tolerant environments are a centerpiece of the cloud hype, but generally, most developers don’t see, and therefore don’t think, about the difference between virtual and physical hardware. The issue with virtual machines (in-house virtualisation or clouds) is that the view from the operating system ends at the hypervisor. You can not see what happens at the metal. Now, for computer systems to work as we have grown to expect, certain things are sacrosanct. This is because without them, there is no guarantee that what we write will be there when we go to read it (this applies just as much to memory as it does to disk).

An example is the sync() or fsync() system call, that instructs the Operating System to write all the data currently in the filesystem buffers, out to disk. Now, in virtual machines, whether or not fsync() does what it should is a bit of a mystery. In fact, there has been suggestions that in particular circumstances and under high load Amazon’s Elastic Block Store, at least according to sources close to Reddit, will happily accept calls to fsync(), saying that the data has been written to disk, when it may not have been.

No amount of virtual architecture is going to save you from virtual hardware that lies.

Lesson 4: You don’t HAVE to put ANYTHING in the cloud.

The general rule is that if the machine / image dies, then you must be able to recover data, or restore the service. If you’re hosting a database server, then it will need to be restored or recovered. On the other hand, an application server is much simpler; just write some configuration files. Once you start looking at it like this, it may make sense for a more risk adverse site to put some server types into the cloud and leave others in the data centre. In short, Virtualisation and Cloud computing is not a universal panacea to hardware resource problems.

Of course, many people would say that “commodity” computing is a misnomer, because servers are not really something that should be commoditized, that a “pick one of four sizes” offering is insulting. To a certain extent this is true, but Cloud computing servers are so cheap that you can build around inefficiencies in some parts of the commodity offering by overcompensating in others.

For example, once people realise how cheap CPU and Memory are on IaaS services, they tend to go at least one ‘size’ higher than they would for an in-house server, and they still see massive savings. Regardless of what the purist thinks, it is becoming much more business-efficient to throw hardware at performance problems than it is to spend time investigating the root cause, which leads into …..

Lesson 5: You still need to tune and manage your systems.

In Cloud computing costs are tied directly to resource usage. The virtues of cloud computing are a double edged-sword; Because
provisioning systems is so easy, you may see developers running a dozen tests at once, instead of one after another, to speed up implementation cycles. This means any inefficiencies in the base systems used for such
testing will be magnified, which will directly impact costs.

Just as importantly, resource usage variations in your production systems will show up directly in the bill. However, the customer or business user paying the bill will want to know why these variations have occured. Are they due to different processing rules, different volumes,
program or system changes ? You want to see a consistent relationship
between the business workload and the resource usage (and therefore
cost). This makes budgeting and planning much easier for the Business,
and provides them with confidence in both the SAP support teams and the
platform.

Lesson 6: It is not enough to be secure….

…you need to be seen to be secure. Amazon already performs regular scans of the AWS entry points, and independent security firms perform regular external vulnerability threat assessments, but these are checks of the AWS infrastructure (such as their payment gateways, user security and so on). They don’t replace your own vulnerability scans and penetration tests. Because it may be mistaken as a network attack, Amazon ask to be advised of any penetration tests you wish to perform. These must be limited to your own instances.

Being seen to be secure also means using all the features (including the Amazon Virtual Private Cloud) that are referenced in the AWS Security White Paper. This document, which is updated regularly, describes Amazon’s physical and operational security principles and practices.
It includes a description of the shared responsibility for security, a
summary of their control environment, a review of secure design
principles, and detailed information about the security and backup
considerations related to each part of AWS including the Virtual Private
Cloud, EC2, and the Simple Storage Service,

The new AWS Risk and Compliance White Paper
covers a number of important topics including (again) the shared
responsibility model, additional information about the control
environment and how to evaluate it, and detailed information about the AWS
certifications. Importantly, it also includes a section on key compliance
issues which addresses a number of topics that get asked about on a
regular basis.

Summary

There are differences between managing real servers, virtual servers and Cloud based servers. However, much of what is required for SAP landscapes and Implementations is the same which ever platform you use. In fact the BASIS team may be the only people who notice the difference. One of the biggest differences is the perception of control and ownership, because you can’t “hug your server” any more. What are the biggest differences you see, and how do you see them impacting you if or when your organisation starts implementing SAP systems in the Amazon Cloud ?