Author: Jordan Novet

Pentaho picks up Webdetails to bring richer visualizations to analytics products
Business-intelligence provider Pentaho has acquired Webdetails, a design-focused consultant that has previously done work for Pentaho. The deal will help Pentaho offer richer visualization tools. Powerful visualization tools are becoming more important as businesses become more data-driven and more employees need access to tools that help them work with and get actionable insights from data.

Terms of the deal were not disclosed.

Portugal-based Webdetails’ 20 or so employees have impressive skills as visual artists and have a sense of how data scientists want to explore data, said CEO Quentin Gallivan said in an interview on Friday. They also have the know-how to produce compelling embedded analytics tools for Pentaho customers. Many customers want embedded analytics to be customized with special user-interface details and custom skins, Gallivan said, “so it looks exactly like their application. It can’t look like a (business-intelligence) tool.”

The Webdetails staff has previously made Pentaho plug-ins such as tools for creating dashboards. The Webdetails employees will also contribute visualization input to the summer and fall releases of Pentaho’s Instaview feature for analyzing data sets from Hadoop, MongoDB and other sources, Gallivan said.

The data-analytics market has been busy as of late. QlikView hit Nasdaq in 2010, and earlier this month Tableau Software filed to go public, too. Meanwhile, legacy players such as Microstrategy and SAP have stuck around in the business-intelligence space. To keep adding customers in such a lively area, other players need to bolster their own offerings and help further democratize data. That’s what seems to be happening with this deal.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 22, 2013
Feedly survives the outages from the post-Google Reader rush, adding users, feeds and maybe revenue
Feedly, which has emerged as one of the best replacements for Google Reader in the wake of the announcement that Google will abandon the RSS service, has been taking on millions of new users and at the same time steadily pushing out new features. But the growth in users hasn’t been completely uneventful.

Feedly Co-founder Cyril Moutran

In the five weeks since Google said it would shutter Reader later this year, the Feedly site has gone down two times, co-founder Cyril Moutran told me in an interview this week. The first time came right when the Google Reader announcement was made. There was a “huge load on our server,” Moutran said. “It just came, slammed us really, really fast. … What broke for us was really bandwidth. Basically, just having so many users coming in, the bandwidth was just everybody was coming in, and the servers were not responding.”

So engineers moved static content off the Feedly servers inside a data center and, somewhat ironically, onto Google App Engine, which scales very nicely, Moutran said. Dynamic content stayed put on the Feedly servers, which store terabytes of data, including indexed content from the feeds users subscribe to.

Less than a week later, Moutran said, “we saw another really, really crazy spike.” The site went down again. Developers took a look at the code that communicates between the client and Feedly servers, and tried to make the client more efficient, thereby reducing the load hitting the servers. “Then we had to order some more hardware,” Moutran said — load balancers, to be specific.

That second outage came on a Monday. As it turned out, Feedly gets more traffic on Monday than on any other day, and generally speaking traffic is higher on weekdays than on weekends. Desktop traffic picks up at around 8 a.m. local time and decreases around 6 p.m. Why? Many Feedly users look to the service “not so much in a casual context but more to catch up with what’s going on with the industry,” Moutran said. People use Feedly for work, in other words. Lawyers, designers, and writers are typical business users.

As many more users get on board — more than 3 million had joined since the Google announcement as of April 2, on top of 4 million users active before the announcement — more feeds pile up. The number of feeds is now up to 100 million, Moutran said.

With many more business users and a greater variety of content, monetization is a bigger question, and Feedly feels it must accelerate its efforts in that direction. The company, which is based in Palo Alto, Calif., and has 10 employees, is now looking at how it will introduce a premium or pro version later this year. Feedly could also add a way to take revenue by providing streams of publishers’ premium content inside the desktop and mobile versions of the application.

While plenty of people find Twitter handy for getting news, the migration of millions to Feedly shows the desire for a strong RSS reader still exists. If that desire keeps steady and if Feedly can keep adding features that interest users, it could turn Google’s trash into Feedly’s treasure.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 19, 2013
Cloud keeps looking good for IBM, and flash storage could help in the months ahead
IBM reported $23.4 billion in revenue in the first quarter of the year, down more than 5 percent year over year, and $3.03 billion in net income, down 1.1 percent. Analysts had been expecting more than $24 billion in revenue.

The company said cloud-computing revenue was up more than 70 percent year over year in the first quarter of 2013, following an 80 percent revenue gain in the previous quarter.

Revenue from software-as-a-Service (SaaS) products, including Tivoli software, were up 65 percent, Mark Loughridge, senior vice president and chief financial officer for finance and enterprise transformation, said during a Thursday call with investors. Revenue coming from enterprises setting up their own private cloud with IBM Infrastructure-as-a-Service (IaaS) offerings was up “more than 75 percent,” Loughridge told investors. As was the case last quarter, Loughridge didn’t break out exact revenue for the products, so we have no real idea where IBM’s cloud business stands.

Revenue decreases came in several categories, including hardware. Power Systems server sales dropped more than 30 percent year over year.

How will IBM shore up revenue going forward? Increasing revenue in growth markets will be an area of focus, Loughridge said, although he also pointed to flash memory products as a way to get more growth from storage products later this year. Just last week IBM said it would commit $1 billion to flash, adding it to more hardware and opening 12 facilities around the world to prove to customers the power of flash.

Last week it seemed like a nice idea. Now it looks like a more important strategic move.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 18, 2013
4 networking startups to watch: Open Networking Summit edition
Hype and excitement still surround software-defined networking (SDN). For evidence, look no further than the continuing disagreement on the definition of SDN. Yet, surprisingly, there were few startups at this year’s Open Networking Summit, which wrapped up Wednesday. But a few on the exhibition floor are worth writing home about.
- One Convergence, based in Santa Clara, Calif., is playing the easy-to-use card with the OpenStack-based network-virtualization overlay software it was demonstrating at the conference. It will launch in the coming months. The One Convergence offering is similar in some ways to Nicira’s Network Virtualization Platform, said Roshan Gudapati (pictured), vice president of marketing and sales. “As traffic increases, you can bring up more and more (network) instances,” he said. It’s just as easy to hit the delete button on a network, as Gudapati demonstrated at his company’s booth. (Gudapati later said the One Convergence software is similar in some ways to the approach Nicira had taken in the past.)
- Saisei Networks, based in Sunnyvale, Calif., is working on software for traffic management — the act of setting and enforcing policies that can limit the traffic over a network. The idea is to deliver a product that eliminates the need for purpose-built legacy hardware and works at carrier grade, said Dave Newman, Saisei’s co-founder and chief operating officer.
- EstiNet, based in Hsinchu, Taiwan, has a network simulator to see what would happen if an OpenFlow controller were to send packets across multiple OpenFlow-enabled switches. The software visualizes the route of packets across switches and lets customers figure out the most efficient path and simulate the failure of switches in a network.
- Accelera Mobile Broadband Inc. is combining two favorite GigaOM topics, heterogeneous mobile networks (het-net) and network virtualization. The company, which was founded in 2009, is trying to use network virtualization to set policies and take action across the next generation of wireless networks that combine Wi-Fi, LTE, 3G technologies and sometimes even older networks.
As conference attendees heard this week, investors are still looking for SDN startups to fund. Perhaps these companies will help meet the investor demand. Perhaps still more startups need to start up.

This post was updated at 5:31 p.m. PT to add context to Gudapati’s statements at the conference.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 18, 2013
Facebook throws down efficiency gauntlet with real-time data and open-source dashboards
When I first visited Facebook’s data center in Prineville, Ore., in 2011, I felt privileged to spot some figures on the facility’s power-usage effectiveness (PUE) on a screen affixed to a wall. The PUE number, which gives a sense of how much of the energy gets consumed by computing gear, wasn’t exactly what some reporters wanted to know — total number of megawatts would have been better than PUE, and that sort of information came later — but it was a start toward transparency. Now, the PUE data won’t be such a big deal to catch a glimpse of anymore.

The social-networking giant is giving the general public access to near-real-time dashboards on PUE and another key measurement, water-usage effectiveness (WUE), alongside humidity and temperature data for its data centers in Prineville and in Forest City, N.C. Previously, the PUE and WUE figures were released quarterly. The new dashboards show data down to the minute, albeit with a two and a half hour lag. In the future, Facebook will also post a PUE and WUE dashboard for the data center it’s building in Luleå, Sweden.

The facilities are still under construction, and, as a result, the data in the two dashboards can have abnormalities, but it should become more stable over time. The company detailed its plans in a Thursday blog post on the Open Compute Project site.

Facebook’s Power Usage Efficiency (PUE) and Water Usage Efficiency (WUE) dashboard

Facebook’s Power Use Efficiency (PUE) and Water Use Efficiency (WUE) dashboard

To prod other companies operating data centers to share more up-to-date power- and water-usage data, Facebook will open-source the code for the dashboards. Similar data from other companies could make Facebook look good, as Facebook (along with Google) is on the leading edge when it comes to PUE. eBay, for its part, has released a dashboard showing PUE and WUE as well as other measurements, such as the number of checkout transactions per kilowatt-hour.

Innovations in hardware and software at Facebook’s data centers make lower energy use possible. Whether Facebook will be able to squeeze even more computing power out of its energy and water consumption is an interesting question, and now that more current data is being shared, it’s worth asking what innovations will come in the future. If Yahoo, Microsoft and others follow suit, the pressure will be on for data centers across the board to become more transparent. Those efforts could help data center operators respond to notions that data centers waste energy.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 18, 2013
Yes, people really are still debating the definition of SDN
Thankfully, many speakers at the Open Networking Summit in Santa Clara, Calif., this week agreed on the definition of software-defined networking (SDN): the separation of the control plane and the data plane. But despite the growth of the conference, what people could do with software defined networking wasn’t completely clear, leaving enterprises unsure of what’s possible and what’s not and which vendors can solve which problems.

Bruce Davie, principal engineer at VMware, took the stage with a provocative message: SDN has promised many things, but most of those things are being delivered with network virtualization, and SDN isn’t necessary. Network virtualization, of course, is the preferred term for Nicira, which VMware bought last year for $1.26 billion and subsumed into its own product last month. Davie said he often sees people claiming that SDN can let network administrators do application-level programming of networks, easily provision and manage networks, improve performance and add bandwidth. But Davie has his doubts about that.

After Davie’s address, I ate lunch with a bunch of network engineers from a Fortune 100 company who are getting pressure from executives to lower capital and operational expenditures with SDN. But they’re just not sure what to do. They need to find something fast, but they also don’t want to bring any more risk into their data center. That’s why implementing open-source code on the way from the OpenDaylight Project on top of white-label network gear might not be as good a choice as sticking with expensive hardware with reliable support from a legacy vendor. These guys already hear different definitions from different vendors. When they hear that, no, SDN is not going to make networks programmable, they only become more uncertain of what to choose.

It’s a little easier to see why about half of more than 200 enterprise network administrators surveyed earlier this year couldn’t identify the correct definition of SDN. And so it seems we still are arguing over definitions, that we are still in the hype cycle, that we are still trudging through the FUD phase.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 17, 2013
OpenDaylight optimism persists, along with questions
Since the network vendor-led OpenDaylight Project came to light last week, the tech press, bloggers and even some industry people have expressed doubts about the consortium’s prospects. But at the Open Networking Summit in Santa Clara, Calif., this week, some attendees sounded optimistic about what could come out of OpenDaylight, as it could broaden adoption of software-defined networking.

It’s true that if useful vendor-agnostic code for many networking components is to come out of OpenDaylight, participants will have to clear several hurdles. Some of the 18 companies sponsoring OpenDaylight, such as Juniper and Cisco, compete with each other, and developers might end up having to maintain controller code that works best with certain kinds of networking appliances. While the tech press has brought up that possibility, it’s not completely unfounded; a Big Switch spokesperson has called into question how Cisco specifically will interact with everyone else when it comes to giving code the OpenDaylight stamp.

The lack of customer leadership in OpenDaylight — unlike, say, the Open Networking Foundation (ONF), which has board members from Yahoo, Goldman Sachs and other non-vendors — has been another area of contention. Although it might not have been for lack of trying. When executives from Cisco and IBM were organizing the OpenDaylight Project a few months ago, they reached out to Google and NTT, but neither company got on board, said Vijoy Pandey, chief technology officer of IBM network operating systems. Perhaps companies other than network vendors could jump into the project in the coming months, though.

The role of the ONF, which nurtures the development of the OpenFlow networking protocol, is another open question. In public remarks at the Open Networking Summit on Tuesday, ONF Executive Director Dan Pitt said OpenFlow is a “substrate (that) allows you to build things like open-source software.” He said he didn’t think OpenDaylight would have been possible if there hadn’t been “something to build upon.” Asked if he or the ONF will get involved with OpenDaylight, Pitt said he had no information along those lines.

Even so, the OpenDaylight Project is “much more of a meritocracy” than the ONF, said Dave Husak, founder and CEO of Plexxi, which has paid five digits to be a silver OpenDaylight member. He views OpenDaylight as a vehicle for promoting Plexxi algorithms and application programming interfaces, which Plexxi will contirbute to the project. At the same time, OpenDaylight could surely benefit companies that seek to do more with their networks.

As much as I might want to predict the future and approximate the outcome of OpenDaylight, I’m afraid I can’t do that, and I haven’t found anyone here who can. They all say they’ll have to wait and see. And so will I.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 17, 2013
Instart Logic gets $17M to improve mobile site performance with secret SaaS
Instart Logic, a company looking to share its technology for making mobile sites load faster with businesses, has emerged from stealth mode after two years with $17 million in Series B venture funding. The need for this sort of latency-lowering service will become larger mobile keeps rising and consumer-facing and <a href="http://gigaom.com/2013/03/18/salesforce-rolls-out-new-mobile-features-for-its-chatter-social-network/enterprise-focused companies look to provide richer mobile offerings.

Even though Instart has emerged from stealth mode, the company is keeping the technology behind its Software as a Service (SaaS) a secret for now. (It won’t say how many customers it has, either.) But suffice it to say that the Instart software won’t require any major reworking on the part of developers. “We do not impose any burden (that) you need to change your code or change development methodology,” said Manav Mital, a co-founder and the company’s CEO. “The process is very easy to come in.”

At the moment, Instart, based in Mountain View, Calif., is aimed primarily at web-based applications, not native apps, Mital said.

Tenaya Capital led Instart’s Series B round, alongside contributions from Andreessen Horowitz, Greylock Partners, Sutter Hill Ventures and other investors. Total venture funding in Instart is now at $26 million. The new funding will help Instart add sales and marketing and also work more on their product.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 17, 2013
VCs will fund more software-defined networking startups as enterprises sit on the sidelines
“Just because it’s very early in this market doesn’t mean it’s too early to start an SDN company. It’s always too early before it’s too late.”

There was Shirish Sathaye, a general partner at Khosla Ventures, calling out at this year’s Open Networking Summit on Tuesday for more software-defined networking startups for him and others to fund. His firm saw the promise of Big Switch Networks a couple of years ago. It seems the hunger for SDN companies to back has not fallen away. It’s not particularly surprising in this golden age of enterprise IT, in which VCs are hunting for enterprise plays following the rise of SDN and other disruptions.

While the other side of the coin is that production-scale use cases from enterprises are still hard to find, managed-hosting service providers have been quicker to try out the benefits of SDN. After all, those companies can improve their bottom lines by rolling out new services and potentially lowering capital and operational expenditures. But the enterprises have looked more hesitant to dive in and see what’s possible.

That level of market penetration — where large data centers and Wall Street banks give SDN a shot — could happen in 2014 or 2015, said Paul Santinelli, a partner at North Bridge Venture Partners who has backed Embrane.

Rather than present software that can impress network administrators, the key for investors is whether SDN startups can solve real business problems, said Alex Benik, a principal at Battery Ventures. “I think it’s kind of incumbent on the community to work together to not overpromise and underliver on what SDN can deliver on a reasonable time scale,” Benik said. That delivery, he said, is not coming right away, but more like the end of this year and throughout 2014. With SDN code on the way from the OpenDaylight Project, founders that have been working on open controllers and other components in stealth mode could now be pivoting, so VCs might have to wait a while for the most promising SDN solutions.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 16, 2013
Idibon secures $1.4M as it builds a tool to mine the world’s languages
Idibon, an ambitious stealth-mode startup, has closed on $1.4 million in seed funding from Khosla Ventures to keep building out natural-language-processing software. The software helps enterprises get insight into sentiments expressed in text on the internet in any language you can think of — with a small role reserved for human beings.

The San Francisco company doesn’t want to reveal how everything works yet. But previous work from Idibon CEO Rob Munro provides hints of what’s possible. In his 2012 Stanford Ph.D. dissertation, entitled “Processing Short Message Communications in Low-Resource Languages,” Munro explained that it was possible to build natural-language-processing systems that could handle many variations in word spelling in text messages and tweets in Chichewa, Haitian Krèyol and Urdu when classifying, even when the systems had little time to train and get better and no previous familiarity with the languages. In the case of the texts in Haitian Krèyol that were sent following the January 2010 earthquake in Haiti, prioritizing helped quickly sift out the genuine emergencies. The question is whether a tool could be developed to pick up patterns in text in any language. Such a system, if combined with a powerful translation tool, could be deployed for a wide variety of applications, from sentiment analysis to intelligence gathering.

Rather than leave machines to bear the burden of figuring out what people mean when they communicate in obscure languages, Idibon wants humans to play a role, such as verifying that data is correct. That sort of work could be crowdsourced. “Machines are never going to be 100 percent accurate,” Munro said. The idea of bringing together humans and algorithms to solve problems has come up in other applications, and several came up in on-stage conversation at GigaOM’s Structure:Data 2013 conference in New York last month.

How could enterprises use Idibon? Half a dozen customers are using the beta version of the software in different ways. One is relying on Idibon to run a medical question-and-answer system that can spit out an answer or possible answers. And “a sales organization” is using Idibon to rifle through news articles, blogs and other documents to document relationships among people and organizations and point to past acquisitions, Munro said. It’s also possible for Idibon to process information from multiple languages to serve up data for business-intelligence applications.

For now, Idibon is “just a simple API service,” Munro said. Some direct integration of the Idibon data is happening, too. The software takes in unstructured data — from tweets, instant messages, emails and so on — processes it and responds with structured data, he said. Ultimately, though, “we want to become the leading organization for scalable cloud-based natural-language processing,” Munro said.

English comprises a small fraction of all communication — roughly 375 million people call English their first language, out of more than 7 billion people in the world — and that’s why a tool with more universal linguistic powers sounds so appealing. While not many enterprises might be looking to capture data in little-known languages now, it could become essential in the coming years. If Idibon can come out with a product soon, it could be the beneficiary of a sort of international arms race for truly global understanding.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 16, 2013
Liquid server cooling gains a few more backers, and enterprises could follow
In the next few months, two webscale companies will make announcements about plans to immerse their servers in mineral oil and set them in special racks on production scale, which could help operators save on energy costs, according to a recent report from Quartz. Meanwhile, the company with the mineral oil and special racks, Green Revolution Cooling, is in talks with a number of other webscale companies about production-scale implementation, CEO Christiaan Best told me. The news is a sign that more commercial data center operators are getting over their fears of mixing servers and liquid.

Interest in liquid cooling — as opposed to standard air cooling — in applications other than high-performance computing has been slowly rising since Austin, Texas-based Green Revolution and another provider, Iceotope, came out of stealth mode in 2009, but Green Revolution in particular has seen a tidal wave of inquiries in the past nine months or so, Best said.

Google has shown interest; the company got a shout-out from Amazon Web Services Distinguished Engineer James Hamilton and Intel also liked the results of a year-long test of servers inside the Green Revolution gear. Since the Intel news, there were “a couple big people who started testing us, and those people have been talking,” Best said.

Enterprises are more risk-averse than webscale companies and don’t care as much about cost savings, but Best said he thinks wider enterprise adoption could be just three to five years away.

The question is whether more data center administrators will be able to wash their hands of concerns about removing fans from servers, making hard disk drives liquid-resistant (or just using solid-state drives), bringing in specialty racks and — not to mention — splattering oil on themselves. GigaOM Research Analyst Pedro Hernandez pointed out these issues (subscription required) in late 2009.

But webscale companies can skip the process of modifying servers to fit liquid-cooled racks and just buy custom servers through legacy vendors or lesser known manufacturers with original-design roots that are emerging as direct sellers, such as Quanta. And with more webscale companies rolling up their sleeves, Best’s enterprise forecast isn’t so hard to believe.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 15, 2013
Networking startup NoviFlow announces fast OpenFlow switch
Networking startup NoviFlow is trying to get ahead in the OpenFlow networking race by bringing out a switch capable of running on the OpenFlow 1.3 protocol at up to 200 Gbps.

More and more IT people are coming to understand and express interest in implementing OpenFlow, which separates the control plane from the data plane and lets servers take charge of telling switches what to do with packets. As the trend takes hold, NoviFlow will surely have to put up with fierce competition, as more vendors move to make their switches OpenFlow-compatible and as OpenFlow-friendly code from the OpenDaylight Project hits the market.

The news comes a year after NoviFlow was founded and just a few months after NoviFlow promised switches that could deliver 100 Mbps. Clearly NoviFlow is serious about capturing marketshare as enterprises consider OpenFlow options and at least think about moving away from legacy vendors such as Cisco.

Cisco, of course, downplays the threat to its bread-and-butter business and is taking steps to protect its market-leading position in switches and routers and enviable profit margins.

“I don’t see this (software-defined networking and network-function virtualization) as a commoditization threat whatsoever to Cisco,” said David Ward, Cisco’s chief technology officer for engineering and chief architect, during a call with investors on Thursday. Startups like NoviFlow are hoping to prove Ward wrong.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 15, 2013
Teradata to connect Hadoop and data warehouses, roll out new appliance
Teradata on Monday said it will let data-warehouse appliance owners quickly and easily supplement analysis of data stored in the appliance with data processed in Hadoop. The idea is to make it easier for more users to benefit from Hadoop and keep performance high.

Teradata also announced the new Active Enterprise Data Warehouse 6700, which comes with fast Mellanox InfiniBand networking gear and Intel Xeon X5 processors The box provides 40 percent better compute performance than the previous model, the 6690, and can handle up to 61 petabytes of data.

Teradata’s Enterprise Access to Hadoop is easier to use than mere Hadoop connectors, said Chris Twogood, vice president of product and services marketing. Business analysts can easily transfer data easily on their own, without calling on Hadoop experts, he said. At the same time, mission-critical data can stay inside the data warehouse and other data, such as tweets and log files, can stay in the Hadoop cluster.

The connection between a data warehouse and Hadoop distribution is helped along by a partnership Teradata formed with Hortonworks last year.

Several other vendors offer support for running SQL queries on Hadoop, including EMC’s Greenplum, IBM’s Netezza and Microsoft with its SQL Server. One moving part here is whether to split up appliances for data warehousing and Hadoop implementations. For Teradata, the answer to the splitting question is a resounding yes. With customers as large as Apple, eBay and Wal-Mart running the company’s gear, the Teradata way should hang around for a while.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 15, 2013
Forget data transparency: options grow for letting you hide your data
There’s no doubt it’s a data driven world. But increasing concerns about companies’ collection and uses of personal internet user data have given rise to a few solutions.

One is a personal data locker where users would be able to store their own information and grant companies limited access, rather than abide by companies’ privacy policies. Some people have even talked about compelling companies to disclose the data they keep on consumers, even though it might be hard to understand and use.

But others are simply opting out of the data revolution.

Stopping tracking in its tracks

One CEO has in mind an approach that comes from the opposite direction. Rather than ask companies to disclose more, Bill Kerrigan, the chief executive of Abine, believes internet surfers should avoid letting companies detect their activity in the first place; or at least try to limit the amount of new data companies can gather to tie with existing information about end users.

Abine introduced its browser extension for blocking online tracking in February 2012. The DoNotTrackMe extension is free, although the company charges for another service: the (temporary) removal of information from popular online data collectors such as Spokeo and ZabaSearch. And later this year, Kerrigan said, Abine will release a service for consumers to get proxy email addresses and phone numbers for plugging into websites that demand that information.

Besides Abine’s DoNotTrackMe feature, there are other options for preventing tracking. Free privacy and security software from AVG includes the option, for example. There’s also PrivacyChoice’s free Privacyfix web application, which displays the sites that have installed cookies on a computer for tracking activity and the data being shared through Facebook, Google and LinkedIn. Internet Explorer 10 was released last year with the do-not-track option in place by default, putting Microsoft on the side of privacy advocates, not advertisers.

The trouble is, if companies can’t see consumer demographics or preferences, websites might not be able to delight customers with responsive features. For example, without location information, Google Now would be considerably less powerful. At a recent event in San Francisco, Hilary Mason, the chief scientist at bit.ly, raved about Google Now. “For the first time (a product) takes everything (Google) knows about me and actually gives me something I want,” she said.

Similarly, at GigaOM’s Structure:Data conference in New York last month, executives at other companies that require location and other personal information from users agreed that users are willing to sacrifice personal information if they like what they can get in return.

Bringing data back

Andreas Weigend, former chief scientist at Amazon.com and now a consultant and Stanford University lecturer, is in the habit of asking executives what they could do to impress their customers by using data. He also tends to raise the question of how much data, if any, companies should share with its customers.

For example, should an airline grant access to a recording of his or her most recent phone call to the airline? He raised the question to David Cush, president and CEO of Virgin America, at a 2011 conference on big data. (A video shows what happened; fast forward to 5:40.)

The problem with pushing for data disclosure on a large scale is it will take a lot of pushing from consumer groups, and opt-in from one company at a time could take many years. Legislation might not be ideal, either, as people could just go to different countries if they don’t like the policies governments set in place.

For now, both data ownership and data masking have drawbacks. But give this some time. As more companies dream up more ways to target consumers, and consumers become more weary of being tracked and targeted, better solutions to the privacy problem are likely to pop up in response.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 13, 2013
No SQL or DynamoDB: Airbnb goes with Memcached for Neighborhoods feature
One of Airbnb’s neat features, Neighborhoods, shows people elegant pages on neighborhoods within big cities that can help them choose exactly where to stay. Actual homes where visitors can stay the night are directly tied to the neighborhood pages. The idea sounds obvious, but it took some engineering tinkering to figure out how to make it all work accurately and quickly.

On the Airbnb Nerd Blog on Thursday, engineers Andy Kramolisch and Ben Hughes, who worked on Neighborhoods and previously founded NabeWise, a neighborhood guide for American cities, explained the back-end process of aligning locations with neighborhoods.

Behind the scenes, Kramolisch said, a cartographer carves out the borders of neighborhoods. Then it’s time to match up hosts’ homes with the neighborhoods listed on Airbnb. On the site’s back end, Kramolisch said, the latitude and longitude of available homes are regularly associated with the various neighborhoods in a given city, if those neighborhoods are represented on Airbnb, through an internal system called Glop, short for Genome Location Pipeline. “For example, say you list your place, which is located at (12.333568650219718, 45.43647998034738). The next time Glop runs, it will correctly identify your listing as being in San Marco,” he said.

It’s not as if Neighborhoods works with “insane amounts of data,” Kramolisch said. Still, up-to-date data on places to stay in neighborhoods needs to be served up quickly, so users aren’t kept waiting in front of their screens. Data changes fast, and an SQL database wouldn’t work because of “mass updates,” Kramolisch said. So an internal NoSQL database in cooperation with Amazon Web Services’ managed DynamoDB NoSQL database service was considered. But DynamoDB couldn’t handle Airbnb’s storage needs. So the engineers turned to the Memcached key-value store for quickly serving up data by keeping it in memory.

In going with Memcached, Airbnb is making the same choice as Facebook, Etsy and other companies that operate at webscale. Location is the top criterion for Airbnb travelers, Kramolisch says, and the fast service Memcached enables — 35 milliseconds on average, to be precise — is the kind of solution that could help Airbnb focus on giving customers more of what they want from the site, when they want it.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 12, 2013
IBM ponies up $1B to add flash to more products and speed up big data
IBM said Thursday it would spend $1 billion to support flash storage in more of its products and open 12 facilities worldwide to show enterprises what a difference flash can make. It also unveiled its new FlashSystems line of flash-storage appliances. The efforts could bring higher speeds for the big-data projects enterprises have been getting into, the company said in a statement.

Given how hot the flash storage business has grown in the past year, IBM is smart to do this, although the move could be viewed as a little late in the game. Webscale companies like Facebook have opted for flash memory for quick response from flash star Fusion-io and others, and flash storage array maker Violin Memory and storage giant EMC are coming out with PCI-Express cards.

IBM itself last year bought Texas Memory Systems, which makes flash storage arrays and PCIe cards. At the time it said it would add flash into the PureSystems line of products. Now the company is using the Texas Memory technology for its new FlashSystems line and broadening the flash push to even more products. As time passes, the price of storing with flash approaches that of storage on hard disk drives. That means IBM might find more success now than it would have if it had made its $1 billion flash bet, say, a year ago.

It’s become standard operating procedure for IBM to back hot “new” technologies by writing a big check — and an accompanying press release. It put $1 billion behind Linux in the early 2000s, for example, and in 2007 it said it would make $1 billion available to help IT become more energy-efficient. With $3.8 billion in net income in the third quarter of last year, IBM is just the kind of company that can and does make this sort of big bet. This bet, like the Linux investment before it, could pay off handsomely.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 11, 2013
Searching for document hidden in multiple clouds? Simplexo wants to help you with that
Simon Bain admitted it was a bit selfish. He started working on a product for finding and encrypting files across multiple cloud storage services four years ago because he wanted to do that and no such service was out there.

Bain, who has spent several years pushing complicated document-management software, just released desktop and mobile apps called SearchYourCloud through his company, Simplexo, to let others search and encrypt documents with the storage services people use today. Bain said SearchYourCloud’s release will be able to ride the wave of a couple of big trends.

SearchYourCloud acknowledges that employees use Dropbox, Box, Google Drive, Microsoft Exchange and other easy-to-use services for both personal and business use. That’s why the software doesn’t automatically encrypt every file. It only encrypts those files that users select, and it keeps them in a separate SearchYourCloud folder on Dropbox or another service. Meanwhile, SearchYourCloud is also a nod to the fact that employees don’t just use one single cloud storage service. A search of, say, Dropbox doesn’t always immediately turn up a given document.

Currently, with Windows desktop and iOS apps, Simplexo lets customers search through Dropbox, Exchange and Sharepoint, and it can search through users’ desktop files as well. Support for Box.net, Evernote and Google Drive is on the way on the storage side, and Android and Windows Mobile and desktop Mac support on the app side. Bain said he would like to add the capability for searching across backup instances on Amazon Web Services’ S3 service and other locations later this year.

While Bain is eager to gain enterprise adoption of SearchYourCloud, he might be facing a moving target. IT departments could enforce more rules on where documents can be stored, and cloud storage providers could provide cross-cloud search tools. Other companies with either document encryption or cross-cloud searching could add the other feature and compete directly with Simplexo. Until then, SearchYourCloud appears to be a much-needed crutch.

SearchYourCloud shows search results across various cloud storage services.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 11, 2013
MarkLogic nets $25M to keep up enterprise NoSQL pitch
When MarkLogic Founder Christopher Lindblad started working on a database for unstructured data in 2001, his efforts were prescient. Since then, the database market has since seen a proliferation of non-relational, or NoSQL, startups to handle the wide variety of data types that new data sources such as web applications and digital documents generate. The space has grown so big, in fact, that it has already started to consolidate. Amid all this, MarkLogic has managed to stand out by generating more revenue than pretty much any other vendor, according to figures Wikibon released in February.

On Wednesday, MarkLogic’s success was validated again, as the company announced a $25 million round of venture funding, bringing the total it has raised to $71.2 million. Sequoia Capital and Tenaya Capital led the round; CEO Gary Bloom and other MarkLogic executives also contributed.

MarkLogic like to tout the fact that it’s geared for enterprise use. Features such as high availability, replication, clustering and ACID compliance help differentiate the company from other NoSQL databases, Bloom told me. And although the company is taking in revenue and looks robust enough to go public now, Bloom said he would rather boost revenues to the point that MarkLogic could sustain success after an IPO.

Rather than go after the revenues that open-source NoSQL databases generate, Bloom said he wants to take away database marketshare from legacy companies peddling SQL databases, including IBM, SAP and Bloom’s previous employer, Oracle. That means MarkLogic salespeople will have to convince slower-to-change enterprises on the reality that relational databases might not be the best choice if they want to take advantage of unstructured data. MarkLogic also will have to put up with fellow NoSQL players that are adding enterprise functions, such as MongoDB,

But if MarkLogic’s plan turns out to be fruitful, a public offering could come within a year or two, Bloom said.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 10, 2013
Intel moves forward on new Avoton microserver chips and rack innovations
Intel is just a few months away from production of new chips targeting the microserver market, and more powerful chips for other applications are on the way, Diane Bryant, senior vice president and general manager of Intel’s data center and connected systems group, is expected to say at the Intel Developer Forum in Beijing on Tuesday.

The chip maker wants more developers to try out its products, and to that end it’s opening a cloud innovation center in Beijing where the latest Intel gear will be available for testing and development. Intel is also working on reference architecture to redesign racks and rethink the placement of the elements inside of them in hopes of influencing microserver computing deployments.

Aiming at microservers

While the microserver market might not be huge, it is growing. And Intel needs to play in it, as competition from chip makers using ARM architectures grows.

That’s why Intel is following through with plans to start making power-sipping 22-nanometer Avoton system on chips (SoCs) with billions of transistors in the second half of this year. The “wimpy-core” Avoton chips built with the new Silvermont microarchitecture, announced in June at GigaOM’s Structure 2012 conference in San Francisco, target webscale data center deployments. They will be available for use in Hewlett-Packard’s new Project Moonshot servers.

A Facebook spokesman has said the company looks forward to Avoton, as an earlier wimpy-core chip for microservers, code-named Centerton, didn’t appear to be capable of handling the social giant’s workloads. Whether Facebook adopts Avoton or not, Intel will need to be competitive on price in order to gain widespread adoption in microservers, as my colleague Stacey Higginbotham reported in December.

Just a week after Applied Micro started shipping an ARM-based chip that contains networking capability, Intel is expected to announce a chip targeting networking, too. Intel will start production of its 22-nanometer Rangeley SoCs for networking devices in the second half of 2013. Lisa Graff, vice president and general manager of Intel’s data center marketing group, couldn’t provide details on Rangeley beyond the product’s name and basic purpose.

At the same time, Intel has much more experience with brawny cores than wimpy cores. In the fourth quarter of the year, it will produce Ivy Bridge-EX chips in the Xeon E7 family with upgrades boosting memory capacity from around 4 TB to 12 TB. That’s helpful for in-memory databases. “We’ve been working with (SAP) on HANA, and this is exactly what they want — as much memory as we can possibly give them,” Graff said. “They would like (much) more.”

Storage-specific SoCs in the Atom family and Haswell Xeon E3 processors that will go as low as 13 watts are also on the way, Intel plans to say.

Beyond chips

Beside the chip announcements, Intel is showing interest in working with webscale data centers by collaborating with Chinese companies Alibaba, Baidu, China Telecom and Tencent on Project Scorpio to build more efficient server racks for certain types of applications. Intel is developing rack-scale reference architecture that will show a wide variety of options for racks for hyperscale environments that could allow products to emerge from Project Scorpio and the Open Compute Project.

Taken together, the Intel announcements make the company look like it’s keen on staying top of mind for webscale deployments. But competition is more brawny than wimpy, and that’s why Intel needs to keep making its chips do more, use less energy and cost less money.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 9, 2013
Blab predicts what people will tweet, blog and report on
It’s one thing to monitor social statements on Twitter and other social networks as they happen. It’s another thing to predict what will happen over the next three days.

Blab, a Seattle-based company, has emerged with a tool that lets companies do just that, with visualizations of where conversations will pop up from more than 50,000 sources, including Facebook, Tumblr, Twitter, YouTube, blogs and news outlets. It does this by paying close attention to where a conversation is now and then predicting based on what other conversations it could look like. For example, if people started talking about a previous Amazon Web Services outage on Twitter and then the conversation moved to blogs and then to mainstream media outlets, that same pattern could happen in the case of another AWS outage. That’s why measuring the trajectory of each conversation and storing it for future reference is critical to Blab’s operations.

Blab also shows the top three influencers of a given conversation. Comments from more influential people can help Blab identify what the dominant ideas will be around a particular topic. Following Hugo Chavez’s death, for example, customers could have seen that the Bolivarian Revolution was going to turn out to be the hottest area of discussion.

The Blab tool shows the probability and confidence of its predictions, so customers can get a sense of certainty. Possible use cases include updating advertisements and press releases with keywords and ideas to reflect forthcoming trends and get better results.

Predictive analytics and modeling have already become popular. Now companies are thinking up new ways to make predictions based on unstructured data that businesses can get a hold of, and that’s where Blab fits in. There’s also PredPol, which predicts where crime will happen, so police officers can focus on specific areas, and MindMeld, which offers up information that could be useful based on your speech. Researchers have also been trying to gain insights on possible medical treatments and, yes, social-media trends.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
April 9, 2013