Pages Menu
Categories Menu

Posted by on Apr 26, 2011 in The Cloud |

The day the cloud stood still. Lessons learned roundup…

The well-publisized outage of EBS on multiple availability zones in the US-EAST-1 Region of AWS last week kicked off some excellent blog posts from companies who, through robust architectural choices, managed to weather the storm quite well. It lasted five days, it’s been called the worst cloud computing disaster ever, and Amazon’s communications strategy didn’t exactly shine, but it has presented an opportunity to learn from the companies that are hosting their sites on the AWS cloud better than many of their peers.

This is just a round-up of some of these posts, and the advice given. They’ve been edited down, of course, so be sure to read each of these articles for the whole story:

The Cloud is Not a Silver Bullet — Joe Stump, CTO of SimpleGeo

  • Everything needs to be automated. Spinning up new instances, expanding your clusters, backups, restoring from backups, metrics, monitoring, configurations, deployments, etc. should all be automated.
  • You must build share-nothing services that span AZs at a minimum. Preferably your services should span regions as well, which is technically more difficult to implement, but will increase your availability by an order of magnitude.
  • An avoidance of relying on ACID services. It’s not that you can’t run MySQL, PostgreSQL, etc. on the cloud, but the ephemeral and distributed nature of the cloud make this a much more difficult feature to sustain.
  • Data must be replicated across multiple types of storage. If you run MySQL on top of RDS, you should be replicating to slaves on EBS, RDS multi-AZ slaves, ephemeral drives, etc. Additionally, snapshots and backups should span regions. This allows entire components to disappear and you to either continue to operate or restore quickly even if a major AWS service is down.
  • Application-level replication strategies. To truly go multi-region, or to span across cloud services, you’ll very likely have to build replication strategies into your application rather than relying those inherent in your storage systems.

How SmugMug survived the Amazonpocalypse — Don MacAskill, CEO of SmugMug

  • Spread across as many AZs as you can. Use all four.
  • If your stuff is truly mission critical (banking, government, health, serious money maker, etc), spread across as many Regions as you can.
  • Beyond mission critical? Spread across many providers.
  • Since spreading across multiple Regions and providers adds crazy amounts of extra complexity, and complex systems tend to be less stable, you could be shooting yourself in the foot unless you really know what you’re doing.
  • Build for failure. Each component (EC2 instance, etc) should be able to die without affecting the whole system as much as possible.
  • Understand your components and how they fail. Use any component, such as EBS, only if you fully understand it. For mission-critical data using EBS, that means RAID1/5/6/10/etc locally, and some sort of replication or mirroring across AZs, with some sort of mechanism to get eventually consistent and/or re-instantiate after failure events.
  • Try to componentize your system. Why take the entire thing offline if only a small portion is affected?
  • Test your components. I regularly kill off stuff on EC2 just to see what’ll happen.

AWS outage timeline & downtimes by recovery strategy — Eric Kidd, Randomhacks.net

Eric took an interesting look at various potential strategies, and how long a company would have been offline during the EBS outage:

  • Rely on a single EBS volume with no snapshots: 3.5 days
  • Deploy into a single availability zone, with EBS snapshots: over 12 hours
  • Rely on multi-AZ RDS databases to fail over to another availability zone: longer than 14 hours for some users.
  • Run in 3 AZs, at no more than 60% capacity in each: This is the approach taken by Netflix, which sailed through this outage without no known downtime
  • Replicate data to another AWS region or cloud provider: This is still the gold standard for sites which require high uptime guarantees.

The AWS Outage: The Cloud’s Shining Moment — George Reese, Founder of Valtira and enStratus

The Amazon model is the “design for failure” model. Under the “design for failure” model, combinations of your software and management tools take responsibility for application availability. The actual infrastructure availability is entirely irrelevant to your application availability. 100% uptime should be achievable even when your cloud provider has a massive, data-center-wide outage…

There are several requirements for “design for failure”:

  • Each application component must be deployed across redundant cloud components, ideally with minimal or no common points of failure
  • Each application component must make no assumptions about the underlying infrastructure—it must be able to adapt to changes in the infrastructure without downtime
  • Each application component should be partition tolerant—in other words, it should be able to survive network latency (or loss of communication) among the nodes that support that component
  • Automation tools must be in place to orchestrate application responses to failures or other changes in the infrastructure (full disclosure, I am CTO of a company that sells such automation tools, enStratus)

Today’s EC2 / EBS Outage: Lessons learned — Stephen Nelson-Smith, Technical Director of Atalanta Systems

  • Expect downtime…What matters is how you respond to downtime
  • Use amazon’s built-in availability mechanisms
  • Think about your use of EBS:
    • EBS is not a SAN
    • EBS is multi-tenant…Consider using lots of volumes and building up your own RAID 10 or RAID 6 from EBS volumes.
    • Don’t use EBS snapshots as a backup…Although they are available to different availabilty zones in a given region, you can’t move them between regions.
    • Consider not using EBS at all
  • Consider building towards a vendor-neutral architecture…Cloud abstraction tools like Fog, and configuration management frameworks such as Chef make the task easier.
  • Have a DR plan, and practice it
  • Infrastructure as code is hugely relevant…one of the great enablers of the infrastructure as code paradigm is the ability to rebuild the business from nothing more than a source code repository, some new compute resource (virtual or physical) and an application data backup.
facebooktwittergoogle_plusredditpinterestlinkedinmail Read More

Posted by on Apr 21, 2011 in The Cloud |

It’s Not Broken. You’re Just Doing It Wrong.

Okay, so the title is a bit harsh.

I was intrigued by the rather excellent post over at the blog Il y a du thé renversé au bord de la table, [Rant] Web development is just broken. Yoric makes the argument that web developers are forced to deal with too many “nightmares” that have very little to do with programming. First you have to decide on a programming language. Should you use PHP, C#, Java, Ruby, Perl, or Python? Then you have to choose a web server and OS. Windows/IIS or *nix and Apache? OSX? BSD? Solaris? If you go with Linux, which distro do you choose? Is it worth it to pay for Red Hat, or will Fedora do? What about Ubuntu? Then you have to choose a DBMS, of course. Do you want Oracle? Well, can you afford Oracle? Then there’s MySQL, SQLServer, or PostgreSQL. Or maybe one of the NoSQL databases like MongoDB, CouchDB, or Cassandra. And then you probably want to choose a server-side framework. Rails? Spring? Zend? And a client-side framework, of course, so you don’t have to worry too much about all the differences between the JS engines in each different browser. JQuery? Prototype? Scriptaculous?

And then, once everything is selected, it all has to be configured to work together without (too many) security holes. But, of course, how much does the average developer really know about configuring a secure Linux environment with Apache? Or setting up a secure IIS? And even if the developer does know a lot about configuring all of this, wouldn’t it be more productive to have him or her focused on developing actual application features rather mucking around in Apache2.conf or php.ini, or trying to figure out why their package manager can’t find the right package for some random server component? How do I configure CPAN, again? Do I really need the Multiverse, or will the Universe do? Then, of course, you’ll probably want an ORM, and you’ll need to decide on how you want to glue all the bits and pieces together.

Not to mention keeping all of that up to date and working as new releases get rolled out… oh, and what about scaling up to meet the increased demand if you start to get really popular and get bought by Conde Nast?

Great points. Couldn’t agree more. Anybody guess where I’m going with this?

Tired of worrying about infrastructure? You want to start coding now? Great, take a look at Elastic BeanstalkHeroku, or Force.com VMForce (yeah, I know, “coming soon”). No infrastructure setup required. You still have to choose a language and a platform, I guess, but that seems unavoidable. You have to make some choices in life. However, you don’t have to care about which OS or web server to use, and you don’t have to manage updates of server software. AWS might all be running in VMWare within a virtualized Windows 98 stack based on a billion hand-built Commodore 64s for all I care. As long as it works. And the DBMS is a service too… you don’t have to set it up, you just pick whichever one you want. When VMForce is launched, you’ll have database.com as a DBMS. With Elastic Beanstalk, you have RDS or SimpleDB. With Heroku, you have PostgreSQL out of the box, with a ton of other choices available, but you don’t set them up yourself, you just add them to your account, and they get set up for you.

What about security? Does your data center have 24-hour manned security, including foot patrols and perimeter inspections? Well, Salesforce does. Is your server certified by PCI, ISO, SAS70, and HIPAA? Well, AWS is, and Heroku is hosted on AWS, and they have their own operations team that monitors the system 24/7. Even Multi-Factor Authentication is just another service at AWS. And if somebody finds a security flaw in any of these platforms, it’s not your problem. Somebody else can figure it out and fix it, hopefully before you even know about it. Of course, it’s still important to write secure code, sanitize user inputs, parameterize SQL queries, etc., but at least that’s all in _your_ code. You can focus on writing good code, and not on whether or not you accidentally configured an Apache mod incorrectly, or accidentally allowed anonymous FTP access to your web server, or if your version of PHP has a buffer overrun bug that will allow some random hacker to drop your User table.

You’ll probably still need to glue some things together, and if you’re doing web development, you’ll still want a client-side framework so you don’t have to worry too much about all the various inconsistencies between browsers, but with the infrastructure headaches out of the picture, it’s easier to just start coding.

facebooktwittergoogle_plusredditpinterestlinkedinmail Read More

Posted by on Apr 6, 2011 in Mobile |

iOS Enterprise MDM Configuration Capabilities

 

Thought I’d put together an easy to reference list of the various things that can be configured by an enterprise Mobile Device Management administrator for iOS:

Password

  • Required
  • No Repeating/Ascending/Descending Characters
  • Require Alphanumeric
  • Minimum Password length
  • Minimum number of non-alphanumeric characters required
  • Maximum password age (1-730 days)
  • Auto-lock (1-5 minutes)
  • Password History (1-50 Passwords)
  • Grace Period for Device Lock (amount of time the device can be locked without prompting for a password on unlock)
  • Maximum number of failed attempts (before all data on device will be erased)

Restrictions

  • Allow installing apps
  • Allow use of camera
    • Allow FaceTime
  • Allow Screen Capture
  • Allow Automatic Sync while Roaming
  • Allow voice dialing
  • Allow In App Purchase
  • Allow Multiplayer Gaming
  • Allow Adding Game Center Friends
  • Force Encrypted Backups
  • Applications
    • Allow use of YouTube
    • Allow use of iTunes Music Store
    • Allow use of Safari
      • Enable autofill
      • Force fraud warning
      • Enable JavaScript
      • Block Pop-ups
      • Accept Cookes: Always, Never, From Visited Sites
      • Allow Explicit Music & Podcasts
    • Allowed Content Ratings
      • Movies: Don’t Allow Movies, G, PG-13, R, NC-17, Allow All Movies
      • TV: Don’t Allow TV Shows, TV-Y, TV-Y7, TV-G, TV-PG, TV-14, TV-MA, Allow All TV Shows
      • Apps: Don’t Allow Apps, 4+, 9+, 12+, 17+, Allow All Apps

Wi-Fi

  • Service Set Identifier (SSID)
  • Hidden Network (if target network is set to not broadcast)
  • Security Type: Any (Personal), None, WEP, WPA/WPA2, WEP Enterprise, WPA/WPA2 Enterprise, Any (Enterprise)
  • Password

VPN

  • Connection Name
  • Connection Type: L2TP, PPTP, IPSec (Cisco), Cisco AnyConnect, Juniper SSL, F5 SSL, Custom SSL
  • Server Hostname or IP Address
  • Account
  • User Authentication: Password, RSA SecurID
  • Shared Secret
  • Send All Traffic (Route all network traffic through VPN)
  • Proxy Setup

Email

  • Account Description
  • Account Type: IMAP, POP
  • User Display Name
  • Email Address
  • Mail Server and Port
  • Authentication Type: None, Password, MD5 Challenge-Response, NTLM, HTTP MD5 Digest
  • Password
  • Use SSL

Exchange ActiveSync

  • Account Name
  • Exchange ActiveSync Host (Exchange Server)
  • Use SSL
  • Domain
  • User
  • Email Address
  • Password
  • Past Days of Mail to Sync: No Limit, 1 Day, 3 Days, 1 Week, 2 Weeks, 1 Month
  • Authentication Credential Name
  • Authentication Credential
  • Include Authentication Credential Passphrase

LDAP

  • Display Name
  • Account Username
  • Account Password
  • Account Hostname
  • Use SSL
  • Search Settings

CalDAV

  • Account Description
  • Account Hostname and Port
  • Principal URL
  • Account Username
  • Account Password
  • Use SSL

CardDAV

  • Account Description
  • Account Hostname and Port
  • Principal URL
  • Account Username
  • Account Password
  • Use SSL

Subscribed Calendar

  • Description
  • URL
  • Username
  • Password
  • Use SSL

Web Clips (web pages saved to the home screen as bookmarks)

  • Label
  • URL
  • Removable (yes/no)
  • Icon
  • Precomposed Icon
  • Full Screen

Credentials

  • Specify PKCS1 and PKCS12 certificates needed to authenticate access to your network

SCEP

  • URL for SCEP server
  • Name
  • Subject (representation of X.500 name)
  • Subject Alternative Name Type (None, RFC 822 Name, DNS Name, Uniform Resource Identifier)
  • Subject Alternative Name Value
  • NT Principal Name
  • Challenge
  • Key Size: 1024, 2048
  • Use as digital signature
  • Use for key encipherment
  • Fingerprint (hex string)

MDM

  • MDM Server URL
  • Check in URL
  • Topic (Push notification topic for management messages)
  • Identity: Add credentials in Credentials payload, SCEP
  • Sign Messages: yes/no
  • Access Rights granted to remote administrators:
    • Query Device for:
      • Device Information
        • unique device identifier (UDID)
        • device name
        • iOS version
        • device model name and hardware version
        • serial number
        • overall and available storage capacity
        • IMEI number
        • the modem firmware version
        • SIM card ICCID
        • and MAC addresses for integrated Wi-Fi and Bluetooth
        • carrier currently being used
        • the carrier specified by the current installed SIM card
        • the version of the carrier settings (APN) data
        • assigned phone number
        • whether or not data roaming is currently allowed
        • list of configuration profiles installed
        • list installed security certificates and expiry dates
        • list of enforced restrictions
        • hardware encryption capability
        • whether an unlock passcode is set
        • installed applications (with App identifier, name, version, and size)
        • a list of any application provisioning profiles with expiration dates.
    • General Settings
    • Security Settings
    • Network Settings
    • Restrictions
    • Configuration Profiles
    • Applications
    • Provisioning Profiles
  • Add / Remove:
    • Configuration Profiles
    • Provisioning Profiles
  • Security
    • Change device password
    • Remote Wipe
  • Apple Push Notification Service
    • Use Development APNS Server

Advanced

  • Access Point Name (APN): The name of the GPRS access point
  • Access Point User Name
  • Access Point Password
  • Proxy Server and Port
facebooktwittergoogle_plusredditpinterestlinkedinmail Read More