加载中...

Featured Posts

ARM’s Cortex A7 Is Tailor-Made for Android Superphones

On Wednesday, ARM formally unveiled its next-generation smartphone processor, the Cortex A7, codenamed “Kingfisher.” But there was much more to the A7′s launch than just the unveiling of a new processor architecture for smartphones. The chip company also announced plans to pair the A7 with the much larger and more powerful Cortex A15 in phones and tablets, using a technique called heterogeneous multiprocessing (or “big.LITTLE”, as ARM prefers to call it) to dynamically move lighter workloads from the larger, more power-hungry A15 to the leaner A7 in order to extend mobile battery life.

When used in a dual-core configuration, the A7 will bring the performance characteristics of what is currently a $500 phone to the $100 “feature phones” of 2013. These future feature phones will have the same capabilities as today’s high-end smartphones, but they’ll have the low prices and long battery life that the feature phone market demands. For the high-end “superphones” and tablets of 2013, the A7 will be paired with the much larger and more powerful A15 core to yield a processor that sips power like a feature phone when all you’re doing is some light web surfing, but can crank up the juice when you’re gaming.

The Cortex A7

ARM claims that the A7 will double the performance of its existing Cortex A8 family through a combination of process shrinks and improvements at the level of microarchitecture. Or, as ARM processor division chief Mike Inglis put it at the launch event, “Outpacing Moore’s Law with microarchitectural innovation is what we’ve been working on with A7 as a product.” Though Inglis never mentioned this specifically, I learned that we can actually thank Google’s “open” smartphone OS, Android, for some of that innovation.

The A7′s design improvements over the older A8 core are possible because ARM has had the past three years to carefully study how the Android OS uses existing ARM chips in the course of normal usage. Peter Greenhalgh, the chip architect behind the A7′s design, told me that his team did detailed profiling in order to learn exactly how different apps and parts of the Android OS stress the CPU, with the result that the team could design the A7 to fit the needs and characteristics of real-world smartphones. So in a sense, the A7 is the first CPU that’s quite literally tailor-made for Android, although those same microarchitectural optimizations will benefit for any other smartphone OS that uses the design.

The high-level block diagram for the A7 released at the event reveals an in-order design with an 8-stage integer pipeline. At the front of the pipeline, ARM has added three predecode stages, so that the instructions in the L1 are appropriately marked up before they go into the decode phase. Greenhalgh told me that A7 has extremely hefty branch prediction resources for a design this lean, so I’m guessing that the predecode phase involves tagging the branches and doing other work to cut down on mispredicts.

(Note that branch prediction is one of the best places to spend transistor resources where you get not only greatly improved performance but also improved power efficiency. The power of branch prediction for boosting performance/watt was one of the major revelations that Intel’s Banias (Pentium M) team first brought to the Intel product line. So it makes sense that the A7 has gone all-out here.)

After the decode phase, two instructions per cycle can issue through one of five issue ports to the machine’s execution core. This execution core consists of an asymmetric integer arithmetic-logic unit (ALU), where one pipe is a full ALU and the other is limited to simpler operations. There’s also a multiply pipe for complex integer operations, a floating-point NEON pipe for floating-point and SIMD ops, and a Load/Store pipe for memory ops.

The feature set for the A7 is identical to that of the Cortex A15—this is critical, because when A7 is paired with A15 in a big.LITTLE configuration the two cores have to be identical from a software perspective.

big.LITTLE: Wave of the future, or compromise?

As important as the launch of a new core design is, ARM’s heterogenous multiprocessing plans are perhaps the biggest news to come out of Wednesday’s event. big.LITTLE links a dual-core A15 and a dual-core A7 with a cache-coherent interconnect, and it covers the pair with a layer of open-source firmware that dynamically moves tasks among the cores depending on those tasks’ performance and power needs.

The OS doesn’t actually need to be modified or to be at all aware of the smaller A7 cores in order to take advantage of the technology. All popular mobile and desktop OSes now ship with dynamic voltage and frequency scaling (DVFS) capabilities, so that they can tell the CPU when they need more horsepower and when they need less. For lighter workloads, a typical CPU responds to the OS’s signal by throttling back its operating frequency and lowering its, thereby saving power; for heavier workloads, it can burst the frequency and voltage higher temporarily to provide a performance boost. The open-source firmware layer that will sit between the OS and a big.LITTLE chip can take these standard signals and, instead of downclocking the A15 when the OS asks for less horsepower, it simply moves the workload onto the A7 cores. So while it will be possible to modify an OS to be big.LITTLE-aware, but it’s not necessary in order to take advantage of the capability.

To take a step back, there are two ways to look at big.LITTLE. The first way is to go with ARM’s angle, which is that heterogenous multiprocessing gives you the best of both worlds by letting you scale processor frequency and voltage to much lower levels than would otherwise be possible by simply moving the load to a leaner core. Take a look at the power vs. performance chart below, and you can see that in the big.LITTLE configuration A7 essentially extends A15′s DVFS curve to much lower levels.

I have a lot of sympathy for this angle, because it heterogeneous multiprocessing represents a power-efficient use of Moore’s Law that is becoming increasingly popular. Heterogenous multiprocessing first cropped up as the Next Big Thing on the server side many years ago with Sun’s ill-fated MAJC architecture. Then there was the AMD/ATI merger, at which point AMD started talking about heterogenous multiprocessing and “accelerated processing units”—or APUs—instead of the traditional CPU/GPU division. More recently, Intel has been talking up the potential of heterogeneous multiprocessing in the cloud.

On the client and consumer side, heterogeneous multiprocessing made its big debut in the Playstation 3′s Cell processor. More recently, Marvell has been using this approach for over a year, and NVIDIA’s “Kal-El” ARM chip uses it as well.

So as an overall approach to boosting power efficiency and even raw performance, ARM’s big.LITTLE has been quite thoroughly validated across the industry. Indeed, ARM is actually late to this particular party. All of this is to say that I have no doubt that heterogeneous MP is going to do good things for the smartphone space, because it’s one of the most widely recognized Good Ideas for what to do with the embarrassment of cheap transistor riches that Moore’s Law has given us.

But then there’s my more pessimistic side, which thinks that, in addition to being a good idea, on big.LITTLE is also a bit of a hack that was necessitated by a combination of ARM’s server ambitions and its constrained engineering resources. Back before A15 was publicly launched, I began hearing from sources in the semi industry who were privy to the details of the design and who weren’t particularly pleased at some of the tradeoffs that ARM made. The scuttlebutt was that ARM was clearly gunning for the cloud server market with this chip—the same “microserver” space that Intel is now attacking with part of its Atom line—and some of A15′s design decisions were going to hurt it in the tablet space.

When A15 was unveiled, it was clearly a very robust, full-featured, out-of-order design that was intended to compete in the server and desktop markets with Intel CPUs. Of course, ARM will be able to fit this design into tablets and phones, especially at the right process node. My only point is that the company could have done a more straightforward and mobile-friendly iteration of the A9 if they either 1) didn’t have one eye on the server space with A15, or 2) had the resources to do both a full-blown server part and a high-end smartphone/tablet part at the same time.

In this light, big.LITTLE can be seen as ARM’s attempt to have its cake and eat it, too. It gets to address the high-end by cramming as much hardware as it can into the A15 while still calling it a mobile design, but in smartphone usage situations where A15 will be overkill the much smaller, leaner A7 will be there to take over and conserve battery life.

If the PC Is Dead, Someone Forgot to Tell Intel’s Customers

Another quarter, another fresh set of earnings records for Intel. As has been the case for the past few quarters, the Tuesday release of Intel’s third-quarter earnings shows that revenue is up on both the client and server sides of Intel’s business. As a tech industry bellwether, Intel’s results are always most informative when broken out by vertical, so let’s take a look at what happened this past quarter.

The first take-home from Intel’s quarterly results is that the rumors of the PC’s demise have been greatly exaggerated. Intel’s PC Client Group revenue is up 22 percent year-over-year, a healthy jump that (not coincidentally) echoes the 26 percent year-over-year jump that Apple’s most recent quarterly results saw in Mac sales. The one segment of the PC market that “post-PC” pundits have been right about are netbooks—Atom sales are down a whopping 24 percent sequentially and 32 percent year-over-year. To once again take a sideways glance at Apple’s earnings, the iPad is up 166 percent year-over-year in units shipped, so it seems likely that any tablet cannibalization of the PC is confined to the netbook segment (which consumers have hated for a long time anyway).

The Data Center Group’s revenues were up 15 percent year-over-year—a solid gain, but not as big of a jump as we’ve seen in previous quarters. Nonetheless, this number is just going to keep climbing right along with the number of Internet-connected devices.

Intel’s Other Intel Architecture group had the most robust growth of all the groups, posting a 68 percent jump in revenues year-over-year. But there’s a lot less going on with this number than meets the eye. This group includes the following units (the text below is mostly from this Intel page, but I’ve added comments):

  • Intel Mobile Communications: Delivering mobile phone components such as baseband processors, radio frequency transceivers, and power management chips. So this is where the recently acquired Infineon resides.
  • Intelligent Systems Group: Delivering microprocessors and related chipsets for embedded applications. This group used to be called the Embedded and Communications Group. Hopefully the name change means that Intel has given up pushing their very idiosyncratic definition of the term “embedded,” which basically works out to “not in a PC.” So a Xeon chip in a router is “embedded” by Intel standards.
  • Netbook and Tablet Group: Delivering microprocessors and related chipsets for the netbook and tablet market segments. It’s not clear why this is separate from the Atom group, since Atom is what goes into netbooks and tablets … or, rather, it’s what would go into tablets if there were an x86 tablet market to speak of, but there isn’t.
  • Digital Home Group: Delivering Intel architecture-based products for next-generation consumer electronics devices. This would include SmartTV and Intel’s digital health products.
  • Ultra-Mobility Group: Delivering low-power Intel architecture-based products in the next-generation handheld market segment.

Of the segments above, the Infineon purchase is behind the majority of the gains in this group. Intel bought a very solid baseband business in Infineon and just rolled that into Other IA, hence the big jump. All told, Infineon made up for almost half of this quarter’s Other IA revenue. The Intelligent Systems segment was down slightly, and the trest of these segments are either nascent (digital home, ultra-mobility) or they’ve flopped (netbook and tablet).

The other place where a recent acquisition of a stable, profitable business gave an Intel group a boost was in the Software and Services Group, where revenue was up 720 percent year-over-year thanks almost entirely to the McAfee acquisition.

Overall this was another blockbuster quarter for Intel, with revenues up 9 percent sequentially and 28 percent year-over-year.

As far as the implications for the larger economy, Intel reported slower growth in the consumer segment of mature markets—most of the demand strength this quarter was from enterprise and emerging markets. This fits with the recent string of warnings out of nearly every corner, from the IMF and OECD to the heads of many major banks, that growth in the developed West is slowing down. But that given Intel’s strength in emerging markets, that slowdown will have to spread before it threatens the company’s winning streak.

Have any news tips, or just want to send me feedback? You can reach me at jon underscore stokes at wired.com. I’m also on Twitter as @jonst0kes, and on Google+.

Image: eurleif/Flickr

The Quest for the Holy Grail of Storage … RAM Cloud

The cloud has a big problem on its hands: Cloud storage is failure-prone, slow, and infinitely quirky. Anyone whose platform has been taken offline by one of Amazon’s periodic elastic block storage (EBS) outages can vouch for the fact that reliably storing data in the cloud is a very hard problem, and one that we’re only just beginning to solve.

Recently, solid-state disks (SSDs) have arisen as an answer to the performance part of the cloud storage challenge, but there are deeper problems with scalability and synchronization that merely moving the same databases from hard disk to SSD doesn’t necessarily solve. That’s why a group at Stanford has a radical suggestion: Datacenters should just put everything in RAM.

The proposed system, which the researchers are calling RAMcloud, would be a one-size-fits-all solution to the cloud storage problem that would replace the mosaic of SQL, NoSQL, object, and block storage solutions that cloud providers currently use to address different storage niches. And if it achieves the latter goal, then RAMcloud could be the badly needed breakthrough that does for cloud databases what Microsoft Access and Visual Basic did for relational databases by bringing the technology firmly within the grasp of ordinary programmers and analysts.

The datacenter as a giant RAM disk

At first glance, the idea of moving entire datacenters’ worth of storage to what is essentially a giant RAM disk might seem totally infeasible from a cost perspective. After all DRAM is far more expensive in terms of cost per bit than magnetic storage, so how could a datacenter possibly afford to ditch disks for RAM? It turns out that at cloud scale, the cost per bit issue begins to move in DRAM’s favor.

The Stanford team points out that between its memcached implementation and the RAM that’s on its actual database servers, Facebook was already storing 75% of its data in RAM as of August 2009. All of this data ends up cached in RAM—either explicitly via memcached, which is a RAM-based key-value store that can greatly speed up database access times, or implicitly via system memory—because hard disks are just too slow and too large. So when you load a page from Facebook, the vast majority of data from that page—if not all it—is already being fetched from RAM. Therefore what the authors are proposing isn’t a giant leap—it’s more like the final, incremental step to an all-RAM storage solution.

The paper also cites Jim Gray’s famous five-minute rule for trading disk accesses for memory, pointing out that, “with today’s technologies, if a 1KB record is accessed at least once every 30 hours, it is not only faster to store it in memory than on disk, but also cheaper (to enable this access rate only 2 percent of the disk space can be utilized).” So as disk densities go up, RAM actually gets cheaper for random accesses. (The situation is different for sequential accesses; see this classic interview with Gray on the topic of disk densities and random access frequency.)

The upshot of all of this is that moving everything into RAM would not only be faster than disk, but it would also be cheaper under certain common circumstances.

The de-ninjafication of cloud storage

The benefits of RAMcloud would go beyond mere speed and cost, though. There’s a growing consensus in the cloud storage community that there will ultimately be no “one-size-fits-all” solution for cloud storage, and that the current patchwork of different storage solutions that fall into different quadrants of the scalability vs. latency vs. consistency vs. complexity map represents the new normal. No one storage technology, the thinking goes, can possibly be all things to all customers, the way that the relational database system was in a previous era.

This heterogeneity would probably be fine for cloud storage, if it weren’t for the CIO confusion and loss of programmer productivity that the complexity engenders. By now, everyone sort of knows what Hadoop is (or if they don’t, then they aren’t admitting it), but Hadoop is just one member of a large and growing number of ways to store, retrieve, and transform bits in the cloud. It takes time and deep expertise to acquire and maintain a thorough grasp of each one of the myriad cloud storage options, and technical people who can do this are in short supply.

(Related to this point, I’ve talked in various places about the need for the cloud in general to become “de-ninjafied”, because as it currently stands the complexity associated with getting maximum productivity out of many PaaS and IaaS platforms is so great that few coders have the necessary skills to do so. See my recent piece on the cloud talent crunch, and this followup exchange with Felix Salmon. In the latter, I elaborate on my point about the need to bring cloud programming down to a level where more casual programers can take full advantage of the full range of storage and compute resources that cloud offers.)

The folks behind RAMcloud think that they can to cut the Gordian knot of storage solutions in one mighty stroke, and offer a single storage abstraction that satisfies everyone’s needs for consistency, scalability, and speed while being easy-to-use at the same time. The key is lies in DRAM’s greatest strength versus every other form of storage, including flash: lightning-fast latency.

Latency, consistency, NoSQL

The RAMcloud team hopes to get access latencies all the way down to 5-10 microseconds (compare around 200 milliseconds with current technologies). That’s because the that lower latencies go, the less time each individual read or write instruction spends in the system. And as instructions spend less time in flight, the critical window during which two accesses to the same byte could accidentally overwrite each other shrinks, so it gets easier to maintain data consistency across ever larger storage pools.

For example, read-after-write is a common occurrence in any storage pool, whether it’s a hard disk or a database. In this scenario, a write to a particular file or record is followed immediately by a read from that same file or record. For a database to guarantee its users that every read will return only the most up-to-date data, it must first confirm for every read that there are no prior, pending writes that are working their way through the system from somewhere else to modify to the target record; and if there are such pending writes, then the system must wait for them to complete before returning the results of the read command. So the longer it takes for writes to work their way through the system, the longer it takes for reads to return up-to-date data.

In order to get around this problem, many NoSQL storage offerings simply dispense with the guarantee that reads will return up-to-date records. Because the vast majority of reads are not read-after-write, a database without this guarantee will perform identically to a database with the guarantee over 90 percent of the time, only much, much faster. But every now and then, the NoSQL database will return out-of-date data to a user, because it’s reading from an address that has yet to be updated by a prior inbound write that is coming in from a node that’s further away.

For a platform like Facebook, there are many places where stale data is just fine, hence the widespread use of NoSQL there. If I live three miles away from the Facebook datacenter that houses my friend’s profile, and my friend lives 1,000 miles away, then neither of us really cares if my browser’s copy of his profile is out-of-date because he clicked “update” some 600ms before I clicked his profile link, and my read beat his write to the datacenter. Facebook would rather give me much faster profile loads in exchange for a slightly stale bit of data every now and then.

Many business-oriented database applications, however, could never tolerate this kind of read-after-write sloppiness, even though they would desperately like to get their read latencies as low as possible. For instance, high-speed finance is the perfect example of a market for cloud storage that will pay top dollar for maximum performance but has low tolerance for inaccuracies arising from the kinds of consistency issues described above. But traditional relational databases just can’t scale the way that many of these customers would like.

Relational databases scale poorly because the length of time it takes for writes travel through the system to get to their target grows as the size of the storage pool grows to encompass data stored on multiple networked machines, which means that for a classic relational database the read latency also grows as the system grows. So a relational database’s overall performance rapidly deteriorates as it gets scales outward across multiple systems.

In the current storage context, where users are forced to choose between scalability and consistency, more and more of them are choosing scalability. But safely using these inconsistent databases in critical business applications takes a ton of programmer effort and expertise. If RAMcloud is successful in offering both consistency and scalability, then a number of users can ditch NoSQL and go back to a traditional, easier-to-user RDBMS.

Ultimately, though, what RAMcloud offers isn’t an either-or proposition. Rather, the promise is that RAMcloud’s low latency will let a database with whatever level of consistency—from a fully ACID-compliant RDBMS to a NoSQL offering with fewer consistency guarantees—scale out much further horizontally than would otherwise be possible. This will bring the ACID guarantees back within the reach of some database users whose scalability needs had grown past the point where they could use an RDBMS; these shops can cut significant cost and complexity out of the IT-facing side of their data storage solution by just going back to good old relational databases on RAMcloud.

Challenges ahead

The major implementation challenges to the RAMcloud idea are obvious to anyone who has ever lost some work to an unexpected power outage or reboot. Because DRAM is volatile, the RAMcloud will have to use some combination of hard disk writes and node-to-node replication to achieve consistency. The problem with the former is that it can easily put you right back into the read-after-write latency trap, while the latter solution will massively boost RAMcloud’s cost per bit (i.e. if you have to copy every byte of data to RAM on three other nodes for redundancy’s sake, then it takes three times the amount of RAM to store each byte).

Then there’s the problem of scaling this solution across datacenters. Even if RAMcloud can get its internal latencies down into the microsecond regime using commodity hardware alone, it will be very hard to retain the solution’s latency-related advantages when scaling across multiple, geographically disparate datacenters.

So the challenges facing RAMcloud are huge, but so is the potential upside. If RAMcloud can put database consistency back on the table for storage clusters with hundreds and thousands of nodes, then it could go a long way toward simplifying storage to the point that nonspecialists can build productive database solutions on top of the cloud the way they once did on top of Microsoft Access.

Have any news tips, or just want to send me feedback? You can reach me at jon underscore stokes at wired.com. I’m also on Twitter as @jonst0kes, and on Google+.

Photo: Flickr/ShutterCat7

With Siri, Apple Could Eventually Build A Real AI

As iPhone 4S’s flood into the hands of the public, users are coming face-to-face with something that they weren’t quite expecting: Apple’s new voice interface, Siri, has an attitude. Ask Siri where to hide a body, for instance, and she’ll give you a list of choices that include a reservoir, a mine, and a swamp. Ask her how much would could a woodchuck chuck if a woodchuck could chuck wood, and she might tell you the answer is 42 cords, or she might ask you to clarify if it’s an African or European woodchuck.

Joshua Topolsky’s at This Is My Next began gathering some of the service’s cheekier answers on Wednesday, and now there’s a Tumblr up called Shit That Siri Says which houses an even more extensive, growing collection.

Siri’s answers are cute, but they’re not much different from the “Easter eggs” that sly coders have been slipping into software for decades. Or are they? I want to suggest, in all earnestness, that as Siri’s repertoire of canned responses grows, Apple could end up with a bona fide artificial intelligence, at least in the “weak AI” sense. Siri may be yet another chatterbot, but it’s a chatterbot with a cloud back-end, and that cloudy combination of real-time analytics and continuous deployment makes all the difference.

The roots of intelligence: algos or data?

In its initial incarnation, the part of Siri’s interaction model that responds to jokes, insults, and other casual queries that are merely intended to probe the machine for a clever response puts it in the venerable category of chatterbots. The chatterbot lineage can be traced back to ELIZA, which was a primitive interactive program that would take English-language input from the user and spit it back out in the form of a question. ELIZA was originally intended as a parody of psychotherapy, and an example exchange might go something like the following:

USER: I feel sad.
ELIZA: Why do you feel sad?
USER: Because I made a mistake
ELIZA: Why did you make a mistake?
USER: I have the flu
ELIZA: Maybe you should see a doctor. I’m merely a psychotherapist.
USER: Habla Espanol?.
ELIZA: Now you’re not making any sense!

A chatterbot like ELIZA uses a mix of natural language processing (NLP) and canned responses to take the user’s input and transform it into some kind of intelligible grammatically correct output. The hard part of making a good chatterbot is the NLP portion. For instance, the program in the example above has to know that “make” is the present tense of “made,” so that it can turn “Because I made a mistake” into “Why did you make a mistake?”. This kind of productive, algorithmic knowledge about how to combine a limited vocabulary of nouns, verbs, and modifiers into syntactically correct and at least superficially relevant English is difficult to code.

So the art and science of chatterbot coding as it has been practiced since the dawn of UNIX is in designing and implementing a set of NLP algorithms that can take a finite vocabulary of words and turn them into legit-sounding English sentences. The easy part, at least from a computer science perspective, is in cooking up a complementary slate of pre-packaged answers that are mere strings produced in response to a set input pattern, which the chatterbot produces in specific situations, like when it doesn’t quite know what to say.

For example, in the above dialog, ELIZA might be hard-coded to match the pattern “have the flu” in the user’s input with the output string “Maybe you should see a doctor. I’m merely a psychotherapist.” This kind of string-to-string mapping doesn’t require any kind of NLP, so there’s no “AI” involved in the popular sense. Ultimately the success of the canned answers approach to chatterbot making hinges not on the intelligence of the algorithm but on the tirelessness of the coder, who has to think of possible statement/response pairs and then hard-code them into the application. The more statement/response, or input/output pairs she dreams up to add to the bot, the more intelligent the bot is likely to appear as the user discovers each of these “Easter eggs” in the course of probing the bot’s conversational space.

An adult user will quickly exhaust the conversational possibilities of a chatterbot that has a hundred, or even a thousand, hard-coded input/output pairs. But what about 100,000 such pairs? Or 1 million? That’s where the cloud makes things interesting.

Big Data, big smarts

In the traditional world of canned, chatterbot-style “AI,” users had to wait for a software update to get access to new input/output pairs. But since Siri is a cloud application, Apple’s engineers can continuously keep adding these hard-coded input/output pairs to it. Every time an Apple engineer thinks of a clever response for Siri to give to a particular bit of input, that engineer can insert the new pair into Siri’s repertoire instantaneously, so that the very next instant every one of the service’s millions of users will have access to it. Apple engineers can also take a look at the kinds of queries that are popular with Siri users at any given moment, and add canned responses based on what’s trending.

In this way, we can expect Siri’s repertoire of clever comebacks to grow in real-time through the collective effort of hundreds of Apple employees and tens or hundreds of millions of users, until it reaches the point where an adult user will be able to carry out a multipart exchange with the bot that, for all intents and purposes, looks like an intelligent conversation.

Note that building an AI by piling Easter egg on top of Easter egg in the cloud isn’t solely the domain of Apple’s Siri. When Google does exactly this—for instance, by showing a five-day weather graphic in response to a local weather search, or by displaying local showtimes in response to a movie search—it’s called a “feature,” not an “Easter egg,” though it’s the same basic principle of “do this specific, clever thing when the user gives this specific input.” Indeed, Google has been at this approach for quite a long time, so I expect that they will shortly be able to reproduce much of Siri’s success on Android. They have the voice recognition capability, the raw data, and the NLP expertise to build a viable Siri competitor, and it seems certain that they’ll do it.

But is a “real” AI?

A philosopher like John Searle will object that, no matter how clever Siri’s banter seems, it’s not really “AI” because all Siri is doing is shuffling symbols around according to a fixed set of rules without “understanding” any of the symbols themselves. But for the rest of us who don’t care about the question of whether Siri has “intentions” or an “inner life,” the service will be a fully functional AI that can response flawlessly and appropriately to a larger range of input than any one individual is likely to produce over the course of a typical interaction with it. At that point, a combination of massive amounts of data and a continuous deployment model will have achieved what clever NLP algorithms alone could not: a chatterbot that looks enough like a “real AI” that we can actually call it an AI in the “weak AI” sense of the term.

Have any news tips, or just want to send me feedback? You can reach me at jon underscore stokes at wired.com. I’m also on Twitter as @jonst0kes, and on Google+.

Crisis in the Cloud: How the Tech Bubble Stifles Innovation and Hampers Cloud’s Adoption

Photo: Brock Davis

The start of 2011 marked the moment that “innovation” arrived as the buzzword on lips of everyone from the president of the Consumer Electronics Association to the president of the United States. Invoked in totemic tones in one speech after another, business leaders have stressed “innovation” (often in contrast to “regulation”) as the cure for the world’s economic and cultural ills.

But while innovation enthusiasm was rising among business and government leaders, it also became fashionable in certain quarters to fret that we’re in the midst of an enduring innovation drought. And if my conversations with venture capitalists and entrepreneurs are any indication, then even the cloud—supposedly the cutting edge of technological progress—is facing an uphill battle to innovate. The reason? A combination of easy money and cheap transistors (i.e., Moore’s Law). Even worse, these factors are also creating a kind of hidden barrier to cloud adoption that’s potentially larger than the standard “barrier” issues of security, reliability, regulatory compliance, and vendor lock-in.

Innovation and the talent shortage

In January of this year, right as the innovation-speak reached fever pitch, prolific blogger and economist Tyler Cowen launched the “innovation stagnation” conversation into the punditry mainstream with his best-selling Kindle Single, The Great Stagnation. In this weakly argued essay, which serves mainly to demonstrate Paul Krugman’s point about how little attention high-profile libertarians and other “freshwater” economist types pay to the opposing team’s work (I’m thinking specifically of Marxist economic historian Robert Brenner’s dense, data-rich oeuvre on innovation and “the long downturn”), Cowen begins with the observation that technological innovation has essentially flatlined since 1973. He then claims that America has eaten up all of the “low-hanging fruit”—like free land and a smart yet under-educated populace—that powered previous waves of innovation.

More recently, Cowen’s arch-libertarian fellow traveler Peter Thiel penned an essay in the National Review that also aimed to illuminate how and why the innovations of the past few decades have failed to measure up to those of the Cold War era and before. And now novelist Neil Stephenson has picked up the baton with an essay entitled “Innovation Starvation,” which, I kid you not, blames the problem in part on a dearth of good sci-fi.

All of these authors are focused on a perceived lack of fundamental, earth-shaking innovation of the Apollo 11, “one giant leap for mankind” variety. But for my part, I’m interested specifically in cloud computing, a field where change comes so quickly and where jobs are so specialized that it’s near impossible for one person to stay on top of everything important. The rhetoric around “cloud” might give you the impression that the term is practically synonymous with “innovation,” but in talking to those in the trenches, the reality is a bit different.

I recently had a sit-down chat with Ping Li, a venture capitalist at Accel Partners who does investments across the layers of the cloud stack. Over the course of our conversation, Ping expressed frustration about the difficulty of hiring and maintaining talent right now. “It’s this heated funding environment,” he said, going on to explain that all of the money sloshing around in the Valley had created a market for talent that’s just as tight as it was during the dotcom boom. What’s worse, he explained, is that the talent shortage is stifling fundamental innovation in the cloud space.

To do really fundamental engineering innovation of the kind that was done, say, in the early days of Google and VMware, you need to hire and retain teams of talented engineers. But in today’s go-go funding environment, top engineers are being enticed with truckloads of money to break off and form two- and three-person startups. This phenomenon, explains Li, is why “many of the really big innovations happen in less frothy times.” He did go on to clarify that “some great companies do get created in these times (like Amazon in the last bubble). It’s just harder given talent shortage.”

Li’s comments are by no means the first I’ve heard in this vein. I recently talked to another well-known Valley entrepreneur who told me that his startup had poached top talent from a rival, only to see that talent leave and form a new VC-backed startup. Then there’s the email I got a few months ago from a friend of mine and product manager at Apple, who was wondering if I knew any cloud computing hackers that they could hire. When we get to the point where Apple product managers on the client side are reaching out to their personal networks in search of cloud coding talent for the world’s largest tech company, you know it’s bad out there.

I’m facing this in my own project, a small book publishing startup where I moonlight as Chief Product Officer. We’re written in Ruby and hosted on Heroku, a pair of technical decisions that we made so that we could easily and painlessly scale, and so that we wouldn’t have to waste resources on any sort of sysadmin work. Back in the depth of the last downturn, we were fortunate to have found a team of contract developers who are very talented and who are now therefore very, very busy.

This is par for the course. We’ve heard other startups complain that all of the big Ruby shops are taking only large jobs right now, because they’re so maxed out that there’s no bandwidth for small startup projects or overflow. So if our team gets run over by a bus on the way to an off-site, or vacuumed up by a VC with a giant bankroll, then all of the cloud-based redundancy and scalability in the world won’t get new features pushed out on our platform.

Talent as a hidden barrier to cloud adoption

The tight state of the current talent market has another, second-order effect on cloud innovation that goes beyond the team size issue that Ping Li points out. The talent shortage is also a factor in the slow pace of cloud adoption in the enterprise, or, at least, that’s one way of reading the results of Symantec’s newly released State of the Cloud 2011 survey. The survey asked IT departments about their staff’s readiness to make the leap into the cloud, and here’s what it found:

About half of the organizations surveyed said their IT staff is not ready for the move to cloud. While a handful (between 15 and 18 percent) rated their staff as extremely prepared, roughly half rated their IT staff as less than somewhat prepared. Part of the reason for this hesitancy is their staff’s lack of experience. Less than 1 in 4 computer staffers have cloud experience. As discussed earlier, the adoption of cloud changes how IT works, so experience is absolutely crucial for IT.

The survey goes on to find that despite the excitement around cloud, most companies’ cloud initiatives are stalled in the discussion/trial phase.

Moore’s Law: too much of a good thing?

I said at the outset that there are two factors working against innovation in the cloud, but so far I’ve only talked about the first: easy money. The second factor is Moore’s Law. Advances in microprocessor technology are producing far more hardware than programmers can collectively or individually program.

On the enterprise side, the move to the cloud is driven partly by uncertain economic conditions and is based on an infrastructure layer that offers more performance per dollar with every upgrade cycle. So as economic fear continues to reign and commodity datacenter hardware gets cheaper in cost-per-MIPS, the pull of the cloud will get even stronger. And more dollars flowing into cloud hardware means that the demand will grow for cloud-savvy talent that can put the hardware to productive use.

Not only does the profusion of cheap computing power create an aggregate demand for more cloud programmers, but it also taxes individual programmers like never before. Coding, deploying, and maintaining highly parallel cloud apps is hard. It’s notoriously challenging to architect for parallelism at the design level, because it’s hard for mere mortals to reason about parallelized tasks, especially if they’re non-deterministic (and they often are). It’s also hard to meaningfully test software on today’s large, massively parallel clouds; the logistics are a huge challenge.

Are taxes the answer?

Clayton Christensen, author of The Innovator’s Dilemma, touches on some of thsee issues in a must-read interview on GigaOM. Christensen fingers the “hot money” that flows into sectors looking for a quick return as one of the factors slowing down innovation. He advocates eliminating capital gains taxes on investments of eight years or more, so that VCs will be incentivized to invest for the long term, thereby increasing the quality of innovation by shifting the focus away from things that can be done quickly.

Regardless of what you think of taxes—more of them or less of them—as an answer to our innovation dilemma, it seems clear that the current crisis in the cloud is the product of too many dollars and transistors chasing too few coders and sysadmins. It will take a while for the latter to catch up with the former… unless, of course, another major downturn strikes. It seems ironic that less money could equal more innovation, but it wouldn’t be the first time that a wave of downsizing and tight money boosted productivity.

AMD’s New Make-Or-Break Chip: What You Need To Know

Wednesday’s launch of AMD’s new “Bulldozer” processor is widely seen as a make-or-break moment for the struggling chipmaker. Bulldozer’s architecture is highly unconventional, and it’s being pushed out into the market amid a high-profile leadership shakeup at AMD. It’s also tasked with taking on an Intel that is firing on all cylinders — Intel’s profits are at record levels, its “tick-tock” model for manufacturing process advancement is moving ahead flawlessly, and its upcoming tri-gate transistor introduction at the 22nm process node will give the company a massive on-off boost in efficiency and performance. So with this much riding on Bulldozer, and with this much of an uphill battle ahead of it, how does the launch version of the chip — codenamed Orochi — stack up against Intel’s Core family?

Unfortunately for AMD, the reviews (Anand, Tech Report) are quite mixed. But all is not lost, because Bulldozer could still find a spot in cloud servers.

A look at the first three Bulldozer parts

Bulldozer comes to market in three flavors, the specs of which are listed in the chart below:

Model Threads Base
frequency
Turbo
frequency
Peak Turbo
frequency
L3 cache
size
TDP Price
FX-6100 6 3.3 GHz 3.6 GHz 3.9 GHz 6 MB 95 W $165
FX-8120 8 3.1 GHz 3.4 GHz 4.0 GHz 8 MB 125 W $205
FX-8150 8 3.6 GHz 3.9 GHz 4.2 GHz 8 MB 125 W $245

The first thing that will jump out to veteran CPU watchers about this chart is that the power consumption (the “TDP” column) and clockspeed numbers are quite high — significantly higher than comparable Intel parts. Bulldozer relies on higher clockspeeds to boost per-thread performance, and the chip pays that in wattage, drawing more power than comparable Sandy Bridge chips. But Bulldozer also sports more threads per socket (eight) across all three models than all but the hyperthreaded Core i7, so the wattage boost is amortized over the higher number of threads so that the net per-thread efficiency should be similar … at least, depending on the workload, but more on that in a moment.

You’ll also notice there’s no column giving the number of cores per processor — I’ve included only the number of threads. This gets at an interesting architectural wrinkle that is at the root of Bulldozer’s problems and its promise.

Each of the Orochi Bulldozer processors launched today has four “modules” per die. (The three-module FX-6100 still has all four modules, but one is disabled for yield and product binning reasons.) Each of these modules is sort of like a “core”, but not quite.

A typical CPU core consists of a front end, which takes in an instruction stream and sends it to either an integer unit or a floating-point unit for execution. Each Bulldozer module, in contrast, has a front end, two integer units, and a floating-point unit; what’s more is that each front end takes in two instruction streams simultaneously, which it can then feed to one of the two integer units.

This makes each Bulldozer module essentially a core-and-a-half, at least as far as integer code is concerned. But a better way to think of a Bulldozer module is as a single, dual-threaded CPU core where the integer units are replicated. A typical processor that supports simultaneous muli-threading (SMT) replicates and/or enlarges storage structures like thread state, register files, and scheduling buffers and queues. A ‘Dozer module does all of this same replication, but it also replicates integer execution hardware — it’s this replication of integer execution hardware that is the main difference between a classic SMT design and Bulldozer.

So a four-module bulldozer part supports up to eight threads of simultaneous execution, and sports a total of eight integer units and four floating-point units. Again, this makes it essentially a four-core SMT chip with double the integer resources.

As for off-chip I/O, Orochi sports four HyperTransport links and a dual-channel DDR3 controller.

Power gating keeps the chip’s idle power down, and a turbo feature lets it ramp its clock speed up in short bursts for extra horsepower.

The part is fabbed on GlobalFoundries’ 32nm high-K SOI process.

How it performs

Benchmark results show that Bulldozer scales well with clock speed increases, so it’s no wonder that AMD has pushed those frequency numbers up. But given the performance of this debut desktop part on the kinds of applications that normal users will want to run, AMD should’ve gotten the clock speed up even higher. For most desktop scenarios, Bulldozer in its current incarnation just doesn’t cut in bang per buck versus Intel’s Core i5.

Probably the most obvious desktop application category, and one that has historically been near and dear to AMD’s heart, especially post-ATI merger, is gaming. Gaming is also where Bulldozer really falls down, performing at the middle of the pack and often below AMD’s older Phenom chips. It’s also the case that Bulldozer is no great shakes on anything related to image processing and encoding/decoding. But none of this is a surprise.

What image processing and gaming have in common is that both are floating-point intensive workloads that lend themselves to multithreading. Bulldozer has plenty of threads to go around, but as we saw above, every two threads share a single floating-point unit (FPU). So on a per-thread basis, Bulldozer is a bit starved for FPU bandwidth, and this is what keeps those gaming scores low.

The other place where Bulldozer is hurting is in the caching and memory subsystem. Anand’s and Scott’s benchmarks show that Bulldozer has cache latencies that are significantly higher than competing Intel chips, and this hurts performance on most types of code. Bulldozer also has only one dual-channel DDR3 controller to service eight threads of execution. Again, performance would no doubt greatly improve — especially on floating-point code, which is also typically bandwidth intensive — with another dual-channel controller.

So the double-whammy of scarce memory and FPU resources makes for a relatively weak showing versus much cheaper Core i5 parts in games and most media applications. This alone is going to be deadly to Bulldozer’s aspirations on the desktop.

Integer-intensive applications tell a different story, though, and this is what gives me some hope that Bulldozer will find a home in the datacenter. The chip did very well on Scott’s Zipfile compression benchmarks — this is a classic multithreaded integer workload, so it plays to the chip’s strengths. It also did well on encryption benchmarks, which are similarly integer-bound and amenable to parallel processing. This latter showing is especially important for ecommerce, where servers may have to handle a number of encrypted connections.

But these two bright spots are pretty small compared to the rest of the benchmarks where Bulldozer just didn’t measure up. In all, it’s a pretty thin reed on which to hang the case for AMD’s resurgence in the server market.

To make headway, AMD will have to get Bulldozer’s clockspeed and per-thread performance up, and will have to do something about the memory and caching bandwidth situation. I’m assuming that the Bulldozer server parts will have another memory controller, bringing the total number of DDR3 channels up to four; this will help.

The more fundamental problem is that Bulldozer’s novel architectural choices aren’t an obvious success. What AMD has done is to essentially double down on simultaneous multithreading, a technology whose performance was always very workload-dependent. This means that Bulldozer’s will be very sensitive to the workload type, no matter what AMD does. It may turn out, though, that some types of common cloud workloads will be a good fit for Bulldozer, and will give the part a performance/watt or performance/dollar advantage versus Intel. But right now, this is just speculation. What is certain is that AMD needed a homerun, but Bulldozer — at least in this first incarnation — is a double at best.

Securing The Cloud: Questions and Answers

While cloud computing is easily one of the biggest trends of the moment, a survey from the Institute of Business Value found, perhaps not surprisingly, that when asked about cloud computing, 77 percent of respondents believe that adopting cloud computing makes protecting privacy more difficult; 50 percent are concerned about a data breach or loss; and 23 percent indicate that weakening of corporate network security is a concern.

Cloud presents a new consumption and delivery model that allows users to rapidly deploy resources, which can easily scaled up and down, with processes, applications, and services provisioned on demand. Cloud infrastructures and platforms present considerable advantages as users increasingly want to access applications on tablets and increasingly pervasive devices.

However, given that this model presents data and security changes, it is essential to factor in which cloud computing models are most appropriate for an organization. The big questions with cloud security boil down to: Where is my data? Who will be able to access it? And, how will I be able to maintain oversight and governance?

What Is Security in the Cloud?

Some people describe securing the cloud as a datacenter challenge, sometimes as a software issue, and sometimes as a data or device access issue. In reality, securing the cloud depends on working out where and how to apply those measures specific to your end user.

Indeed security services are continuing to develop, enabling a cloud delivery model and allowing us to have security increasingly delivered in cloud services. This will be applied to both the cloud infrastructures and to services in their own right. In essence the cloud model is evolving as one of the core models for delivering services.

What Has Changed for Security as We Transition to the cloud?

In considering cloud security, it is essential to understand what changes are involved in adopting cloud computing models. For example, multi-tenant infrastructures require isolation to be built in at the hypervisor, network, and storage layer..

Other categories talked about less frequently include cloud governance and how to evolve assurance where data center inspection is not practical when you are scaling a service for a week or even a day.

In looking at each of these, we have had to understand how we can deliver those elements in a cloud delivery model.

A Framework for Building and Articulating Cloud Security

In looking at security, the fundamentals still apply. Building security involves three essential considerations: Have we designed security into how we build the cloud? Have we understood this in the context of what we are trying to do? Have we got security running for these cloud environments?

To ensure we are communicating our approach we developed the cloud security reference model to help achieve this. This reference model covers eight categories ranging from cloud governance, security, and risk and incident management to infrastructure protection and personnel and physical security.

Aspects such as patch management and vulnerability scanning are able to be put into the areas of context in this example of securing infrastructure and protecting against threats and vulnerabilities.

The reference model also allows for setting expectations about what the cloud provider would do and what the customer is expected to do.

In relation to patch management including the process of determining what patches are available, and where they should be applied to both the environment the cloud provider is managing and the elements the customer chooses to manage themselves. The reference model allows these conversations to be discussed in a structured way.

Security by Design

This framework is our way of communicating the approach. Layered on to this is the design, build and consume approach to delivering and enabling cloud security.

This starts with the design approach. Enabling security starts with us designing security into the build. For many of us in building security through designing people, processes and technologies are design elements we have established over many years and are now applying them to cloud.

The design phase is about understanding that a one-size-fits-all approach to security in the cloud will not work. It is about getting the appropriate security in place for the workload or service that is being considered for the cloud.

The consume phase is about the delivery of security for the cloud, and about ensuring the services that are being delivered and the people, processes and technology approach to security articulated against a framework and reference model are understood and appropriate for that services.

In summary over the past several years, security concerns surrounding cloud computing have become the most common inhibitor of widespread usage. This often translates to where is my data, who will be able to access my data, and how will I maintain oversight and governance?

Each cloud model has different features which changes the way security gets delivered which also changes the way we look at security governance and assurance. Determining your desired security posture and enabling cloud in such a way that the new risks can be managed in a rapidly changing landscape.

For those of you looking for more of the questions which need to be asked or more about how can you deliver security in the cloud, there is more information at here. Or, you can contact the author, Nick Coleman: coleman@uk.ibm.com

Proxy Wars: Client-Aware Cloud Rhetoric Heats Up

Photo by Noah Shachtman

There’s a war going on for the right to serve your browser pages, but it’s not being fought by the companies behind the servers themselves. Rather, proxies—systems that sit in between your browser and the server you’re connecting to—are competing to be the middlemen in your web experience. Amazon’s Silk launch brought the issue of proxies into the limelight by creating a client that integrates tightly with a specific proxy, but this is a battle that has been playing out over the past few years among service providers that cater to site owners. And, as is typical of such battles, the rhetoric is starting to heat up.

Exhibit A is a response to the Amazon Silk announcement by Matthew Prince, the CEO of CloudFlare, posted by Om Malik. Unsurprisingly, the CloudFlare CEO is talking his book. CloudFlare is a proxy that site owners can use to protect against hacking, DoS attacks, sudden bursts of traffic, and other types of threats that can bring down a site. So instead of focusing on the client side of the experience like Silk, CloudFlare is strictly a server-side technology that site operators can deploy via a pretty straightforward DNS change. Still, it’s likely that the two technologies will but heads at some point as they both vie to insert themselves in between the browser and server, and Prince’s public “questions must be raised” attack does make it seem that CloudFlare is threatened by Silk.

Most of Prince’s letter is a mix of privacy questions that have been widely raised and classic FUD, but there is one part of it that raises a good point that I’ve not seen elsewhere:

Unlike existing proxies (like CloudFlare) or traditional CDNs whose clients are the website owners, Amazon’s clients are the web browsers, so they are copying content without the content owners’ explicit permission. This could lead to copyright headaches. While there are safe harbors for service providers caching content, Amazon’s nebulous status between network provider, retailer, and even publisher could muddle their case in court and make them a tempting target. The more Amazon alters the content in order to increase performance, the more jeopardy they will put themselves in.

It’s hard to imagine the same media companies that Amazon has inked huge distribution deals with suing the company over the fact that it caches copies of content and does some compression and resizing. But I could easily see smaller players looking to make a quick buck leveling such a suit at Amazon. This is the kind of thing that an enterprising lawyer might jump on, possibly successfully. It’s not surprising that Prince saw this angle, given that he also teaches cyber law at the John Marshall Law School.

Speaking of proxies, Skyfire (see the post below) is another example of a place where these middlemen are playing an increasingly large role in the browsing experience. Skyfire’s focus is strictly on video, whereas CloudFlare catches all of the in-bound HTTP traffic for a domain and does its caching, optimizing, and security checking. As the cloud gets bigger, these proxies are going to proliferate and possibly segment by traffic/service type.

Gmail nightmare: even if it’s “in the cloud” you still have to back it up

James Fallows at the Atlantic writes a terrifying and personal account of what it’s like to lose one’s entire Google archive to a hacker. Fallows’ wife’s Gmail account was hijacked by a spammer, who not only stole her identity and tried to phish money from her personal contacts, but also vandalized the account by permanently deleting everything in it. She eventually regained control of the account using Google’s standard protocol for these situations, but:

When she looked at her Inbox, and her Archives, and even the Trash and Spam folders in her account, she found—absolutely nothing. Of her allocated 7 gigabytes of storage, 0.0 gigabytes were in use, versus the 4+ gigabytes shown the day before. Six years’ worth of correspondence and everything that went with it were gone. All the notes, interviews, recollections, and attached photos from our years of traveling through China. All the correspondence with and about her father in the last years of his life. The planning for our sons’ weddings; the exchanges she’d had with subjects, editors, and readers of her recent book; the accounting information for her projects; the travel arrangements and appointments she had for tomorrow and next week and next month; much of the incidental-expense data for the income-tax return I was about to file—all of this had been erased. It had not just been put in the “Trash” folder but permanently deleted.

If you’re a Google Apps user, you must have a backup, period. I personally use IMAP for this purpose, but there are other products like Spanning Backup for Google Apps, which has a reasonably priced individual option. I actually just signed up for a trial account after reading the Fallows piece, because you can never be too careful.

This lesson goes for anything that you put in the cloud: your login credentials represent a single point of failure, and vandal can get in and just totally ruin your life. Related to this, here’s another tip: if you keep sensitive data on Dropbox (Quicken files, passwords, insurance docs, etc.), be sure to store it on an encrypted volume. I use an encrypted sparseimage on OS X, which is perfect for this kind of application. A sparseimage is striped so that when you modify it, only the modified parts of the file have to be re-synced to Dropbox. This way, you can use a multi-gigabyte image without having to worry about the entire thing needing to be re-uploaded every time you make a small change to a file.

IBM Extends Reach In Supercomputing, Big Data With Platform Solutions Purchase

Photo by Sam Gustin/Wired.com

IBM has announced that it’s buying high-performance computing powerhouse Platform Solutions for an undisclosed sum. With a client list that spans the typical HPC verticals (from CERN to Citigroup), Ontario-based Platform Computing’s main area of expertise is parallel computing, and the company is mostly a player in the classic grid/cluster market—more “Top 500 Supercomputer List” stuff than “cloud computing.” However, you need plenty of storage bandwidth and parallel compute horsepower to run analytics the data that cloud generates, and in this respect Platform Computing has a growing footprint in the Big Data space.

Monash Research has the definitive blog post on this acquisition—mostly consisting of notes from a briefing with Platform Computing that the author took in August—and it seems likely that the company’s new MapReduce offering is what IBM is after with this purchase. The MapReduce product supports a mix of Hadoop and non-Hadoop workloads with much lower latencies than the typical Hadoop implementation; it also includes a number of other features to enhance reliability, monitoring and workload management, availability, and so on.

“IBM considers the acquisition of Platform Computing to be a strategic element for the transformation of HPC into the high growth segment of technical computing and an important part of our smarter computing strategy,” said Helene Armitage, general manager, IBM Systems Software. “This acquisition can be leveraged across IBM as we enhance our IBM offerings and solutions, providing clients with technology that helps draw insights to fuel critical business decisions or breakthrough science.”