Author: Jordan Novet

Google mulls spicing up Google Maps Engine with Google Earth Engine imaging
Google is looking into upgrading its Google Maps Engine with the large supply of satellite images available on Google Earth Engine, according to a company engineer. The move would enable businesses to perform comparative mapping analytics and show changes over time on the fly, without having to build out new infrastructure.

After giving a talk about Google Earth Engine at the Strata conference in Santa Clara, Calif., on Thursday, Louis Perrochon, a Google engineering director, said engineers are working on the project, although there is no planned release date and it’s quite possible such capabilities are never released. A Google spokesman said he could not confirm the plans or provide a timeline for implementation.

Every day Google downloads terabytes of satellite images from the U.S. Geological Survey and maintains the files on spinning disks in data centers. With so much data, Google Earth Engine “allows you to do a lot more fancy stuff” than Google Maps Engine, Perrochon said during his talk. He demonstrated his point by using Google Earth Engine to show which San Francisco parks lie closest to BART stations, which parks are new to the city and which parts of the Sahara desert had gotten new roads.

The Google Earth Engine data sets, which span more than 25 years, have been available to researchers for a few years now. One use case is for a government to locate areas of deforestation and conduct investigations. As Perrochon demonstrated during his talk, users can quickly see where some areas have gotten new vegetation and other areas have been stripped of their trees.

It might sound obvious, but because Google has so many data sets, Google Earth Engine can also do things like offer maps devoid of clouds and lines on satellite images. To get a sense of the cloud problem, try zooming in on Northern Ireland. With Google Earth Engine data sets, users can quickly cycle through images from many days.

Plenty of enterprises could benefit from having heaps of satellite images available for fast analysis on the web, as opposed to Google Earth. (Take media outlets, for starters.) If Google does roll out the expanded data sets to Google Maps Engine, it might cost enterprises less to get and analyze the data themselves. And it could once again demonstrate that Google Maps is well ahead of competitive efforts to map the planet and give people easy ways to access that data.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
March 1, 2013
How connected should your baby be? Rest Devices ponders open data dilemma
A forthcoming connected device — a onesie shirt that monitors a baby’s position, breathing, temperature and sound — poses the dilemma of how to make one’s little data open to collaboration with other systems developed for the internet of things.

Each washable Peeko Monitor shirt from Rest Devices contains a strip that detects information and then connects to a thumb-sized data logger, which wirelessly sends data to a Wi-Fi Station. From that point, it’s relayed to Amazon Web Services, which normalizes the data and directs it to smartphone apps.

A package of three shirts, the logger, the Wi-Fi station and the smartphone app will be available in stores for $199 following a planned July launch, Rest Co-founder and CEO Carson Darling told me at the Strata conference in Santa Clara, Calif., on Thursday. The company is also testing a shirt for adults that can monitor for sleep apnea.

The Boston-based company, which has taken on around $500,000 in seed funding, has a few interesting question to wrestle with, including:
- How can Rest minimize incorrect alerts that unnecessarily wake up parents if a baby’s breathing changes in a normal way while also giving alerts that prove the device is working? The company is already getting feedback from parents using the product in beta tests to improve algorithms.
- Should Rest partner with health systems to make sure babies can continue to be monitored even after they leave hospitals? Customers might be able to compare their babies’ live patterns with aggregate normal information or the babies’ individual tendencies, depending on how the product evolves.
- Should Rest set off on the journey of getting regulatory approval as a medical device that can help diagnosis and treatment? For now, Rest will release the Peeko Monitor as a non-regulated product such as a camera and microphone for baby-watching. But perhaps the Food and Drug Administration might want to regulate it, as it did for a connected toothbrush.
- Perhaps most importantly, how open should the data gathered about babies be? Darling recognized the value of tying such data in with other connected devices, such as a thermostat, which could be automatically lowered should that help parents and their baby get some sleep. And he knows the data could help people learn more about babies’ sleeping patterns, which could be valuable for the medical community. But Darling also said he wants to provide data where it’s useful.
Whether or not the product will be able to push data into a larger system for many connected devices, such as Qualcomm’s AllJoyn peer-to-peer network, appears to be an important issue as more such devices are emerging, and managing all the data could become a hassle. My colleague Stacey Higginbotham has noted that as people adopt more and more connected devices, developers would be wise to think about ways for computers to make decisions on all the incoming data, rather than relying on humans to do it all on their own. And that means it would be best for the data to be open.

Those questions aside, selling the Peeko Monitor to parents who are nervous as it is could be a challenge in itself.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 28, 2013
Using Arduinos to make conferences better
While walking around the Santa Clara Convention Center on Tuesday, I nearly stepped on an Arduino.

A small and simple open-source computer board that can connect to sensors, the Arduino was one of 50 such gadgets that O’Reilly Media, host of the Strata Conference, planted around the facility. Sensors attached to the Arduinos pick up humidity, motion, sound and temperature data which they collect and wirelessly send to a ZigBee device that uploads it all to an Amazon Web Services cloud for real-time visualization and analysis and future processing, said tech-book author Alasdair Allan, one of the people behind the project.

Arduino-connected “Awesome” boxes for capturing audience feedback

It’s the second time O’Reilly has deployed the devices at an event under its Data Sensing Lab project. The devices made their debut at the Strata conference in New York in October. What’s new this time was the appearance of 11 big red “awesome” buttons, each connected to an Arduino, that attendees can push on their way out of a talk to show that they liked it. If a particular speaker “kills” her talk, that’ll show and maybe she’ll get a bigger room next time. Or, if there’s a notable lack of enthusiasm, maybe she’ll get the boot.

Sure, Allan, O’Reilly Founder and CEO Tim O’Reilly and Strata Chairman Edd Dumbill had fun talking about the project, throwing together the Arduinos, Allan said. But the technology could prompt O’Reilly to improve certain parts of the conference, such as counting people or getting lots of feedback. Plus, the project could end up being spun off to another company. So far, it’s already inspired “Distributed Network Data,” an O’Reilly how-to book from Allan and co-author Kipp Bradford. And Allan has posted the code for the O’Reilly Arduinos, which the company calls sensor motes, on GitHub.

The open nature of the project makes sense, as it can tie in with other systems of connected devices. If it stays like that, it could fit in well with an Android-like open ecosystem for the internet of things that my colleague Stacey Higginbotham envisions.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 28, 2013
Big data can improve websites, cut energy bills, save lives
In her keynote at the Strata conference in Santa Clara, Calif., on Tuesday, Rebecca Shockey, global research leader for business analytics and optimization at IBM’s Institute for Business Value, asked why about a quarter of respondents to a recent IBM survey still had not yet started engaging in “big data activities.” Making the business case and showing potential returns on investment turned out to be a major obstacle to adoption, she said. Later at the conference, some of those still on the fence might have found some good ideas.

Putting small energy data in perspective

Barry Fischer, head writer at the data blog from Opower, the company that crunches data for utility companies servicing almost half of all households in the United States, passed around sample bills that chart consumers’ year-to-year energy use and show how consumers compare with their neighbors. Besides the information for bills, Opower also provides alerts if consumers are on track to get a high energy bill and a Facebook app for consumers to compare their energy use with that of their friends. The data Opower collects — consumption figures from utilities, preferences from users and third-party weather, housing and demographic statistics — also enable Fischer and other company bloggers to present simple and consumer-friendly correlations, such as the fact that Yahoo Mail users typically pay $110 more per year in energy bills than those who use Google Mail. Taken together, Opower’s uses of data show how millions of people can benefit from contributing their own individual data.

In January, GigaOM’s Katie Fehrenbacher named Opower one of her 13 energy data startups to watch in 2013.

Etsy bakes a Funnel Cake

In another session, three data engineers from Etsy revealed how they use Hadoop to detect issues with various functions on the website, and they talked about building a program other employees use to optimize the parts of the site that generate the most revenue. At Etsy, already a big Hadoop user — at one point, engineers ran 5,000 Hadoop jobs for a variety of purposes in a single month — a popular term is the attribution funnel, or the process customers take as they buy products on the site. The data engineers wanted other employees to be able to identify the steps where customers get caught up before purchasing, such as email address verification to establish new accounts. So they built a program called Funnel Cake, which scales better and deliver real-time information faster than Hadoop, said engineer Wil Stuckey. Running Funnel Cake, employees can streamline the process and increase the percentage of site visitors who end up buying products. Beyond that, they can see which kinds of pages lead to the most sales and focus more or less attention on browsing and searching functions or storefronts from product makers.

Vending machines, advertisements and babies

Other use cases on display at the conference spanned from vending machines to babies. One company has installed sensors on its vending machines and now monitors the resulting data in real time to spot theft and cut down on purchasing new machines to replace stolen ones. An internet advertising company now uses highly scalable software based on Hadoop MapReduce, Apache Nutch and Apache Solr to detect traffic sources for advertisements, bringing new revenue. And a hospital’s neonatal intensive care unit has implemented a visualization tool for real-time health statistics that shows signs of “baby crashing” and thereby can reduce mortality rates.

Executives from Aetna, Williams-Sonoma, Facebook and other companies will discuss big data use cases at the GigaOM Structure:Data conference in New York on March 20-21.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 28, 2013
The Guardian’s data journalism is cool, but it takes three months to make
Once he finds a suitable topic, Feilding Cage, a New York-based developer and journalist for The Guardian, can easily spend three months generating the source information and designing a visualization for what’s become known as data journalism. The results bring understanding and reader engagement to topics that are otherwise discussed with a lot of words or static numbers. Readers can and do play around with the information, share it widely and discuss it for long periods after it appears online.

The Guardian’s interactive guide to gay rights in the United States

Cage is one of a handful of Guardian journalists who generate reports that say new things about topics that pop up in the news or are just plain old interesting. Cage and his boss, Simon Rogers, editor of The Guardian Datablog and Data Store, spoke about their work at the Strata conference at Santa Clara, Calif., on Tuesday.

Along with The Guardian, a few other news organizations have been putting an emphasis on data-driven reporting and visualizations, apps and even games in the past few years, such as the Chicago Tribune, the Los Angeles Times and ProPublica (Check out the Data Journalism Handbook for more information on this sort of work.)

Data journalism and visualization stand out for the verification and occasional gray-area explanations that journalists provide. Cage, for example, accompanied his interactive visualization of gay rights in the United States with a blog post explaining his methodology and disclosing his assumptions.

Screenshot from the Zoomdata’s big data analytics iPad app

It’s certainly one way to say something fresh with data, but it’s time-consuming when you consider big data analytics apps that provide users with real-time information users can compare against Hadoop-processed historical data, such as Zoomdata. (That company, which my colleague Derrick Harris covered last year, released the beta version of its iPad app on Tuesday.)

It would be neat to find a happy medium for enterprises that want original insights that every employee can see and use and act on but doesn’t take three months to generate. That’s especially true because the return on investment for work like Cage’s is hard to identify, although it’s possible the content could indirectly generate revenue by driving users to content they have to pay for.

Bridging the gap might be a matter of finding the perfect data scientist for the company. Or it might be a matter of time before the kind of work Cage does is automated. A computer already can write an earnings story, although it might be a few years before computers put wordsmiths out of business.

Maybe it just doesn’t make sense to cross data journalism visualizations with big data analytics apps. But I, for one, would like to play with such a tool.

Entrepreneurs from companies that work with and make visualizations from big data, such as Quid, will speak at the GigaOM Structure:Data conference on March 20-21 in New York.

Disclosure: The Guardian is an investor in Giga Omni Media, which publishes GigaOM.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 26, 2013
DataDirect Networks brings out Hadoop appliance for enterprises
DataDirect Networks, a hardware vendor with roots in providing storage for high-performance computing, is introducing a Hadoop appliance for enterprises, adding another notch to the trend of going with hardware for big data deployments.

DataDirect built hScaler to meet the speed and performance needs of those customers while offering ease of use for enterprise customers keen on Hadoop. Speed aside, hScaler stands out because it does away with direct-attached storage and incorporates RAID architecture instead. It lets users scale computing and storage resources independent of one another, precluding the chore of swapping out a server when a disk fails, as my colleague Derrick Harris has written.

The hScaler appliance, which runs with the Hortonworks Data Platform, can move fast with InfinBband storage capable of operating at 40 gigabits per second. In a sample configuration, 504 terabytes of storage are possible in a rack. The rack is four times as dense as a conventional data center rack, requiring less spending for cooling and square footage.

Because they aim to speed up and simplify Hadoop deployments, appliances such as hScaler are catching on, and DataDirect Chief Technology Officer Jean-Luc Chatelain expects the trend to continue. Greenplum, Oracle, Teradata and other companies sell appliances capable of running Hadoop jobs. For the sake of taking advantage of easy and quick data analytics processing, Chatelain sees the Hadoop hardware trend only getting bigger.

Appliances could be useful for enterprises looking to run Hadoop jobs, as employees can save time and focus more on building applications. Big data veterans will talk about innovative uses of Hadoop and other big data technologies at the GigaOM Structure:Data conference on March 20-21 in New York.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 26, 2013
Kickstarter campaign for new-age HyperCard development app wraps up
Remember HyperCard, the Mac development program Apple released in the eighties to let people make screens and buttons for new programs? RunRev, a company based in the United Kingdom, developed LiveCode, HyperCard’s spiritual successor. The company took the unusual route of putting the idea on Kickstarter, hoping to get enough backers for a free, open-source version of the platform.

Enterprises must pay to make Android and iOS apps with LiveCode. But that could change if the Kickstarter campaign succeeds in raised at least 350,000 British pounds, roughly equivalent to around $530,000. So far, with a little more than two days to go, backers, including Steve Wozniak, have offered to put up about 83 percent of the cash RunRev seeks for the open-source LiveCode. The open-source iteration in the works will support natural language, so as to be useful for people with little if any coding skills.

There are a few ways to help people learn how to code. Universities could put computer-science courses on an open-source site such as Coursera. (Other sites have made introductory materials available for free, too.) Developers can teach computer science to high school students. Apple could open-source HyperCard (it discontinued the software in 2004). Or RunRev or another company could make a free version of a HyperCard-inspired program such as LiveCode. These efforts have their upsides and downsides, but their shared objective certainly makes more sense now than it did in the eighties.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 25, 2013
Why all that hacking news might not be so bad
The list of companies that have reported being hacked just keeps growing, with Microsoft and Zendesk making headlines most recently. Although it’s caused plenty of anxiety for IT people and everyday users alike, there might just be an upside: The attacks have demonstrated the need for the kinds of information sharing the federal government wants to do to improve cybersecurity.

Following the demise of one proposal, the Cyber Intelligence Sharing and Protection Act (CISPA), the Obama administration has taken new steps with an executive order and a policy strategy. The executive order draws a roadmap for sharing more of its information with the private sector, and the strategy shows the intent to do more on diplomatic and intelligence fronts.

The Microsoft and Zendesk hacks follow others in recent weeks at Apple, Facebook, the New York Times, the Wall Street Journal and the Washington Post. Twitter said people had attempted to hack the site. And the security company Mandiant released a report providing details on a Shanghai-based division of the People’s Liberation Army of China that has stolen “hundreds of terabytes of data from at least 141 organizations,” almost all of which have headquarters in countries where English is the native language. Hackers even found a way to build a lure for a spear-phishing attack out of one version of the report.

President Barack Obama, in his State of the Union address last week, acknowledged that American companies have been hacked and said the country must not “look back years from now and wonder why we did nothing in the face of real threats to our security and our economy.” Obama’s executive order on cybersecurity, released on the same day as the president gave the speech, directs the government to release more, and more timely, information on cybersecurity threats. It calls for a framework for reducing “cyber risks” to critical infrastructure in the United States, and the framework will have to help owners and operators of that infrastructure manage the risk. In doing so, the government cannot pick one product or service as a cure-all; it claims to value a competitive marketplace. The order also mandates that owners or operators of critical infrastructure that could cause catastrophes if hacked will be confidentially contacted and be given a way to submit information to the federal government.

A week after the executive order, the Obama administration released a policy paper laying out steps for advancing cybersecurity. It says businesses should share best practices, and it states that the FBI and the State Department will do more to try to stop hacks of trade secrets. Elsewhere, it promises that several other federal agencies will continue to do what they have been doing toward that end.

Some people have argued that the executive order doesn’t do enough to improve cybersecurity. Then again, others like it much better than CISPA.

Regardless of what people think about it, the federal government’s efforts to respond to the hacks could prompt more companies to protect their own assets. It takes advantage of the good parts of CISPA but not the bad, which my colleague Derrick Harris has previously identified. And with news of more and more attacks coming to the fore, more companies could be inclined to try sharing information with the federal government for the purpose of the greater good. How bad could that be?

Oh, by the way, as a side effect of all of these attacks and the new federal policies, don’t be surprised to see more enterprises trying out security products that focus on infrastructure, such as Mandiant and Cylance, which I wrote about earlier this month. Look for more stealth-mode security startups jumping out of the shadows, too.

Feature image courtesy of Shutterstock user Tatiana Popova.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 23, 2013
VCs pour money into data startups during 2012
Venture capitalists made more big data investments in 2012 than in any previous year, according to a report released Thursday from CB Insights and law firm Orrick.

“As evidenced by financing and deal activity, Big Data is gaining steam,” the report stated. The number of big data venture deals in 2012 rose by nearly 20 percent as compared with the previous year, going from 132 deals to 164. In the fourth quarter of 2012 alone, there were 49 venture deals for big data plays.

Among them are Basho, ClearStory Data, Continuuity, Drawn to Scale, Mortar Data and WibiData.

Despite the bump in number of deals, the total amount of money VCs threw at big data startups in 2012 — $1.39 billion — was down by nearly 7 percent year over year. The median deal size decreased slightly, from $6 million to $5.7 million.

The recipients of the biggest big data investments of 2012 were, in descending order, Cloudera ($65 million), Palantir Technologies ($56.1 million), Rocket Fuel ($50 milion), 10gen ($42 million) and Nimble Storage ($40.7 million).

Which VC firms closed the most big data deals last year? SV Angel, which invested in 14 companies. Sequoia Capital and IA Ventures tied for third, with 13 deals each, followed by New Enterprise Associates with 12 and First Round Capital with 10.

The percentage of deals for big data infrastructure in relation to all big data funding continued to fall, while big data analytics has risen to a high of 48 percent of all deals. (The other category listed in the report is big data applications.)

By far, California has been the leading state for big data venture deals in the past five years, with 230 in 2008-2012. New York, with 67 in that same time window, and Massachusetts, with 57, lag far behind.

Assuming the trend lines continue, then, you would be most likely to get funded if you run a big data analytics company in California.

As for 2013, the report cites a few big data investments VCs have already made so far: Sailthru and Nomi.

Others include Ayasdi and Think Big Analytics, whose founder and CEO, Ron Bodkin, will speak at GigaOM’s Structure:Data conference in New York in a few weeks.

In addition to Bodkin, entrepreneurs from other venture-backed big data startups who will speak at the conference include Justin Sheehy, chief technology officer at Basho; Jonathan Gray, Continuuity’s founder; Bradford Stephens, Drawn to Scale’s CEO; and Doug Daniels, chief technology officer of Mortar Data.

Feature image courtesy of Shutterstock user extradeda.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 22, 2013
Managed hosting providers offer up early-stage SDN use cases
Software-defined networking (SDN) use cases are slowly emerging, giving IT people ideas about how improved agility and lower capital expenditures could play out in different settings. Who’s releasing the use cases? Managed hosting service providers, among others.

Earlier this week I wrote about how NTT Communications has been rolling out SDN at multiple data centers around the world, to automate network configurations and provide other benefits. I also learned about how Peer 1 Hosting has signed up for SDN vendor Embrane’s software to round out the Peer 1 Infrastructure-as-a-Service (IaaS) cloud offering, and I found out that SunGard has started using those same products to lower response times for its Recover2Cloud disaster-recovery enterprise cloud. The increased agility from SDN and other innovations lets SunGard promise response times that are 30 to 40 percent shorter, and the company expects to offer better service-level agreements to its own customers as a result.

Meanwhile, SDN company Nicira, which VMware acquired last year, has identified Rackspace, AT&T and DreamHost as customers. All three of those companies provide hosting services alongside other offerings.

In 1999 or thereabouts, service providers were quick to jump onto the multiprotocol label switching (MPLS) bandwagon as a way to help information travel faster on a network, said Ram Shanmugam, SunGard’s senior director of product management. Now many of those same companies are standing up as early adopters of software-defined networking.

And as that happens, it’s only natural for enterprises to witness the benefits of SDN and decide to give it a try, Shanmugam said. And thanks to SunGard’s market position, the shift could happen soon: Over 70 percent of Fortune 500 companies use SunGard for disaster recovery, Shanmugam said. Going forward, more SunGard clients could get exposed to the perks of SDN, as the company has been discussing the inclusion of SDN as well as software-defined storage for SunGard’s enterprise cloud.

More SDN products hitting the market will also speed up adoption of the technology, which virtualizes networks and enables users to automatically provision firewalls and load balancers in a few minutes — something that took an engineer hours or days to do with a hardware appliance. The vendors are ready for the demand increase, or getting closer to that point. Networking hardware vendor Juniper Networks, soon after acquiring startup Contrail Systems, announced plans to release products later this year and next year that will allow for consolidation of hardware and connect network services on multiple devices. Cisco said in November 2012 it would buy Cariden, a company that’s come up with SDN strategy. And just last week F5 Networks, another hardware vendor, acquired LineRate Systems, which is looking to help companies take on more web traffic with more easily scalable networks, as my colleague Derrick Harris wrote.

So far, the promise of better agility has been one of the best motivators for companies to try out Embrane’s SDN products, and cost savings have taken a back seat, said Dante Malagrinò, Embrane’s CEO. This is somewhat a contrast to the adoption of server virtualization, where costs savings drove adoption among enterprise customers and the benefits of agility were only perceived later.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 22, 2013
Facebook’s long Graph Search to-do list
Facebook’s Graph Search, a tool released in beta last month, sounds like it could permit users to tap the intelligence of friends, and friends of friends, and so on. Once on it, you should be able to get tips on restaurants, music and movies from people you trust, or people who went to certain schools, or lived in certain countries.

But what if the product is not quite ready? In some cases, search results might simply be disappointingly limited — for example, people who said on Facebook that they are from Japan “like” just two restaurants in New York City. That’s what Facebook engineer Mike Curtiss was able to pull up during a Graph Search demonstration for reporters on Thursday at the Facebook campus in Menlo Park, Calif.

Search results pop up on your screen within a couple of seconds, thanks to the Unicorn search engine Facebook developed. The system’s method of connecting drilling down on node after node of the users and cultural items that matching queries appears to be a smart way of refining queries again and again, and Facebook’s flash-only database servers can support that activity in very short order.

But it’s not so much about the technology behind Graph Search: Limited search results are just one example of the tool’s current shortcomings for users. Even Facebook founder and CEO Mark Zuckerberg has asked the company’s search-infrastructure team to improve the Graph Search, Curtiss said.

So far, hundreds of thousands people are using it, said Tom Stocky, a Facebook product manager. “It’s a long way to go between that and a billion,” Stocky said.

Graph Search only works in English, and even that presents plenty of challenges. Since it debuted last month, Facebook has started supporting ways of calling up friends when people use words other than “friend” in the Graph Search box — homies, besties and the rest. But the tool doesn’t distinguish between people who live in a given city and those who lived there in the past. And it doesn’t filter out people who are near certain states. Likes and check-ins provide data for Graph Search to work with; status updates are not yet searchable.

What’s more, plenty of Facebook users have taken certain information off Facebook or don’t update it anymore with real work information or their favorite movies. In other words, the quality of the data feeding the search results holds back what Graph Search can do.

Facebook first started working on Graph Search two years ago. It’s not like every Facebook engineer has gone to work on the function, but some resources have been allocated to it, and that clearly will continue at least in the next few months.

And there’s a lot to do in the way of query optimization, or structuring searches in ways that will yield better results, Curtiss said.

The big question is whether the efforts going into Graph Search will pay off for Facebook in terms of bringing excitement among users back to the site, or whether Facebook will try to monetize Graph Search by giving businesses access to the tool. That could make Graph Search pay off in a big way.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 21, 2013
Basho Technologies takes aim at more enterprises with upgrades
A new version of Basho Technologies‘ increasingly popular Riak open-source NoSQL distributed database has added upgrades for enterprise customers, a growing area of interest for the Cambridge, Mass.-based company.

Basho supports Riak, a database that allows sites such as GitHub and Bump Technologies to store, replicate and retrieve data at scale, even when multiple nodes fail. As an open-source product, companies can adopt it easily after using legacy hardware to do similar work, and users have found it to be effective at a large scale. The next-generation feature upgrades have added to the original version by increasing bandwidth between databases in clusters as part of a premium data center replication service.

Delivering improved performance and higher bandwidth by allowing more than one transmission-control protocol (TCP) connection for the replication of data on clusters across data centers, the new version of the replication function is designed to meet the needs of customers operating the largest-scale systems. Riak 1.3 came about after what amounted to a rewrite of the original programming, said Andy Gross, Basho’s chief architect. Hundreds of companies employ the current replication feature, according to a Basho blog post, and can now roll out the new version.

Going forward, the company could very well add block storage capability, not unlike Amazon Web Services’ Elastic Block Store.

Whether or not that comes to pass, though, Basho has come a long way since its founding in 2008 in response to Amazon asking for a database that could support its cloud infrastructure.

Last year Basho came out with Riak Cloud Storage for companies that want to offer Infrastructure as a Service (IaaS) compatible with AWS’ S3 storage product. It’s easy to understand why. The AWS application programming interface (API) is now the standard for cloud storage, as my colleague Barb Darrow wrote.

To date the company has raised $39 million. That includes investments from IDC Frontier and Tokyo Electron Device Limited, among other groups. Comcast, Best Buy, Yahoo Japan and Github have signed up as users.

Through a spokeswoman, Basho reported several indicators of growth in the past year: The payroll has grown to 115 employees from 60. It has opened offices in London and Tokyo. And now 25 percent of Fortune 50 companies use Riak.

Basho’s chief technology officer, Justin Sheehy, will speak about why companies don’t need to put big-data strategies in place at the GigaOM Structure:Data conference on March 20-21 in New York.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 21, 2013
NTT expands its IaaS geographies and touts its use of SDN
NTT Communications is expanding its private enterprise cloud outside Asia with the addition of more data centers. The new is more than just a geographic expansion — it represents a full on production use case for software defined networking.

A subsidiary of the NTT Group, NTT Communications announced its enterprise cloud by way of data centers in Hong Kong and Japan in June 2012. It was billed as “the world’s first cloud service to incorporate OpenFlow,” according to a news release. OpenFlow is a protocol for separating packet forwarding from routing decisions, which can be moved from a switch to a different controller. Such separation has the potential to lower the cost of equipment and create interoperable gear that would allow buyers to program their network infrastructure without resorting to proprietary and complex programming options created by the networking gear vendor.

Since last June, data centers in California, Virginia and Singapore have joined the NTT Communications lineup, and facilities in Australia, Malaysia and Thailand will come online in March, according to a news release.

The new data centers will also use software-defined networking to give NTT and its clients more agility and lower costs. Implementing network virtualization in the data centers enables more flexible and automated configuration changes to the network connecting a customer’s servers, even across multiple data centers, according to a presentation NTT Communications executive Yukio Ito gave at last year’s Open Networking Summit.

NTT isn’t completely new to SDN. Last year it was named as a customer of Nicira’s Network Virtualization Platform, as my colleague Stacey Higginbotham reported. The company was using Nicira controllers to move data sets from data center to data center following the earthquake off the Japanese coast that triggered a tsunami and led to subsequent nuclear accidents.

While NTT is making a statement with its expansion of SDN-enabled data centers, other companies that run colocation or cloud facilities for enterprises, such as Rackspace and AT&T, could follow suit with similar offerings soon. After all, both of those companies are also Nicira customers, and hosting companies are popular targets for SDN deployments.

In any case, the rush to deploy software-defined networking in production environments will continue, especially after such a large vendor has gone public. Stacey predicted last month that 2013 would be the year big companies will see that their efforts to prevent network-hardware commoditization are doomed to fail.

Feature image courtesy of Flickr user bandarji.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 20, 2013
Bina launches box to analyze genomes; cloud on the way
Bina Technologies is launching its Bina Box for on-premise genome processing, enabling researchers to quickly and cheaply analyze genomes and give doctors data-driven suggestions for custom treatments.

Use a genome sequencer to see one person’s DNA profile, and you’ll get 6 billion unique characters, or half a terabyte of data, said Bina co-founder and CEO Narges Bani Asadi. Start processing it to find mutations and variations, and you’ll find yourself with more than one terabyte. It’s not small data. As the price of sequencing a genome keeps dropping, scientists will want to do this more and more. It’s a big data problem, Bani Asadi said. The company wants to solve the problem on premises, with hardware and software.

The Bina Box will run on “high-end Intel processors and very high-bandwidth memory,” Bani Asadi said, and can scale out with additional Bina Boxes as customers processing needs change. Price depends on how much processing customers have in mind. If a customer wants to process 100 samples a month, for instance, it would cost $12,500 per month, or $125 per sample, said Mark Sutherland, Bina’s senior vice president of business development.

A Bina Cloud to tie in with the Bina Box will come later this year. The Bina Cloud will host just the needle of genomic data isolated from among the haystack of the entire genome, and it will enable scientists to aggregate many genomes, run data visualizations and collaborate to derive big-picture insights. Early customers are already using a pilot version of the cloud.

The box offering contributes more proof of the notion that, for certain uses, public clouds might not make sense, not yet anyway. (It remains a largely popular perspective in financial services, as my colleague Barb Darrow reported a couple of months ago.) The Bina Box, for its part, “provides security that on-premise solutions have, versus cloud solutions, (which) sometimes people in this industry are not completely ready to move into,” Bani Asadi said. Big pharmaceutical companies are a perfect example, as a breach could hamper product development using genomes. Aside from security, there’s the matter of performance. “It’s impossible to send (half a terabyte of raw data from a sequencer) to the cloud easily,” Bani Asadi said.

Meanwhile, other genomics-focused startups, including DNAnexus and Appistry, are eschewing hardware and relying exclusively on cloud resources.

Whether hardware is involved or not, as my colleague Derrick Harris mentioned when he wrote about Bina last year, it’s clear that the rise of big genomics inherently equates to a rise in data.

The practice of merging life sciences and other industries with big data will come up in conversation when Ayasdi CEO Gurjeet Singh hits the stage at GigaOM’s Structure:Data conference on March 20 in New York.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 19, 2013
Despite worries over recruitment startup DeveloperAuction, VCs are showing interest
Developer Auction is a startup that aims to disrupt recruitment for software developers and other professions with low supply and high demand, by putting control in the hands of the people who will actually do the work. It’s a nice idea. AdRoll, Counsyl, Lookout and other companies have hired developers through DeveloperAuction.

The trouble is in the execution. To some degree, the site is redundant, because developers already get plenty of offers without using the site, and big fees aren’t always involved. The site hasn’t been promoted as much as it could be. It prioritizes salary above all else, even though some developers don’t. Despite the write-ups it garnered in the tech press, a handful of developers I spoke with for this article had not heard heard of the site. While auctions run in private to keep developers’ current bosses from spotting attempts to find better work, at least one developer was fired for using the site. And to top it all off, the company’s name calls to mind the antiquated, antebellum idea of selling people as products.

Nevertheless, co-founder Matt Mickiewicz told me venture capitalists are interested in funding the company. At nine employees, overhead isn’t enormous, and its business model allows for sizable revenues off transactions. The company is profitable, Mickiewicz said. Plus, it’s not Mickiewicz’s first startup: He’s co-founded a few startups, including 99designs, which has become profitable, taken on venture funding and expanded operations to multiple countries. Plus, he’s a compelling entrepreneur. In a recent interview, Mickiewicz confidently delivered smart answers to tough questions.

Value begets more value

VCs refer more companies to the site than any other group, Mickiewicz said. A talented developer is “one of the biggest value adds to add value to (a VC’s) portfolio,” he said.

Sure, the site’s concept makes sense for the developer segment, especially nowadays. The number of jobs in Silicon Valley increased by 4 percent from the second quarter of 2011 to the second quarter of 2012, according to data from California’s Employment Development Department that was included in the 2013 Silicon Valley Index. That high level of job growth hasn’t been seen since 2000.

The website says a typical developer will receive five to 15 requests from venture-backed startups for job interviews. Developers stay available on auction for two weeks and are free to take any offer, not necessarily the one with the highest salary. The first auction ran in September.

The company claims to be cheaper than a recruiter, but it’s not exactly cheap. If a company decides to pay a developer $100,000 a year through DeveloperAuction, the company pays either a $15,000 flat fee — 15 percent — or a $10,000 fee as well as the equivalent of $10,000 of the developer’s salary in stock options (10 percent plus 10 percent in options). DeveloperAuction even kicks in a small reward to developers as an incentive for using the service — 20 percent of the fee that the hiring company pays DeveloperAuction, which can be $3,000-6,000 or more, according to the site. That appears to mean DeveloperAuction rakes in $12,000-$24,000 or more per hire.

There’s competition, too. Besides more traditional hiring routes on company websites and job boards, so-called dev bootcamps have emerged as a new talent source.

Issues bubble up

However, last month the site received criticism on a few fronts in a Hacker News thread. Commenters complained about spam emails, shared alternative recruiting options (Pitchbox, for example), pointed out technical shortcomings and even called the company’s name into question. According to one user, “‘auction’ reminds me (of) the last time human beings were sold like stuff.” Another user reported being fired for using the site.

Developer Zac Shenker of Collusion, a company with a plan to make iPad drawings shareable with a nifty pen, told me the site duplicates and commercializes what already happens naturally to those looking for new jobs, whether on LinkedIn, over email or at networking events.

Recruiting managers at larger companies might be reluctant to use the site because of its emphasis on compensation packages above all else. A recruiting director at one webscale company who declined to be named for this article said the company would not hire people through DeveloperAuction, because finding people with the right character traits is more important than finding someone willing to work for a low price.

And while the model seems sensible now, with developers in great demand, the most talented ones will get gobbled up quickly, resulting in a drop in quality, said Chris Hollindale, chief technology officer at Hasty, a stealth-mode startup creating technology that aims to make people healthier.

Developers, customers abound

Mickiewicz sounded unswayed as I brought up the issues.

Regarding the comparison to slave auctions, he said people elect to join the auctions. That wasn’t the case in the pre-Civil War South.

“I think that’s a very unfair comparison,” he said. He emphasized that developers don’t have to work for the highest bidder. “It’s about who tells the best story at the end of the day,” he said. Developers, he said, want to make “a meaningful impact.”

Still, the founders might just have to consider changing the name, because it includes the word auction, he said.

Overall, Mickiewicz cited the adoption of the website among job seekers — 10 apply for every auction spot — and employers alike as proof of its value. “The employers are seeing very, very good success with us, compared to any other platform,” he said.

Hollindale intends to try out the site when Hasty is ready to hire another developer.

“To me, it’s a very interesting kind of twist on the whole technical hiring process,” he said.

Will the steady stream of developers availing themselves of the auctions de-escalate the tech bidding war? The answer to that question could determine the fate of DeveloperAuction.

This story was corrected at 9:17 p.m. with a revised list of companies that have hired employees through DeveloperAuction. Dropbox and Quora made job offers through the site but did not hire.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 16, 2013
Dropbox reportedly eyes IPO as it courts enterprise storage customers
Dropbox, a major player in online document storage and sharing for consumers, has met with bankers ahead of a possible initial public offering later this year, according to a report Thursday from Quartz. If the timing is right — the IPO could come in the second half of 2013, Quartz reported — Dropbox would beat out Box, another growing online storage vendor, in the race to go public.

While consumers have steadily flocked to Dropbox, the enterprise cloud storage space remains up for grabs. Box has long wanted to be the Dropbox of the enterprise space, as my colleague Barb Darrow reported last year. Box clients include Netflix, Dow Chemical and Procter & Gamble.

Dropbox introduced new features for enterprise IT administrators on Tuesday, including reports on employees’ storage use and the ability to give or take away access to documents for certain users and devices. But IBM and other enterprises have forbidden employees from using Dropbox, showing that hurdles to adoption persist. (A Dropbox blog post challenges that notion, stating that “people at over two million businesses and 95% of Fortune 500 companies are using Dropbox,” but does not tell whether all those companies pay for the service.)

This is also a crowded space, with other enterprise cloud storage providers such as Google Drive, Microsoft’s SkyDrive, Accellion’s kitedrive, Egnyte, GroupLogic’s activEcho, SurDoc and ownCloud aiming for a piece of the market.

Given that Dropbox has not emerged as the enterprise storage leader, it could be early for Dropbox to go in for an IPO, even as it has a $4 billion valuation and has raised $257.2 million from Sequoia Capital, Institutional Venture Partners, Goldman Sachs and others. Perhaps it would be smarter to bolt down enterprise cloud storage revenue first.

Feature image courtesy of Shutterstock user Cheryl Ann Quigley.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 14, 2013
Why Valentine’s Day needs data centers
It’s a happy Valentine’s Day if you’ve found a match on an online dating site. But it could be a tough day for IT people at Match, eHarmony, OKCupid and other sites, which might face traffic booms as antsy users scramble to find last-minute dates.

There’s certainly plenty of demand for the services. In 2009, CIO reported that more than 40 million Americans had tried online dating. In September 2012, 1 in 10 internet users frequented an online dating site, according to a December 2012 report from comScore.

Sites vary as to the times of year when traffic peaks. The number of unique visitors to eHarmony.com increases 45 percent on Valentine’s Day, and the boost continues until the end of the month, a spokeswoman wrote in an email.

The biggest day of the year for Match.com registrations isn’t Feb. 14; it’s actually Jan. 2. “Then we get another big spike after the Valentine’s Day holiday, so this weekend will be another spike,” a spokeswoman said. And for Zoosk.com, the peak comes on Dec. 26, while traffic is consistently heaviest in January.

How do engineers accommodate all the traffic and not sacrifice performance?

For eHarmony, it was a matter of scaling out infrastructure. “The systems over the years have been expanded to absorb large spikes to all the main areas and events on the site, such as posting photos, communication requests and the interactions with the mobile apps,” the company spokeswoman wrote.

Computerworld reported in 2009 that eHarmony had 4 terabytes of user information in storage for 20 million users. That comes from responses to the site’s Relationship Questionnaire. The spokeswoman did not immediately have current figures available.

Match.com was storing 70 terabytes of user data for more than 1 million subscribers when Microsoft published a case study on the dating service last March. Until May 2010, Match.com was updating user information on 110 Microsoft SQL Server servers across two data centers in the United States. In order to keep profile updates timely — less than two seconds — the company began distributing the updates across the servers, rather than update the entire dataset at once.

Valentine’s Day isn’t necessarily the high point of the year for Facebook. Jay Parikh, Facebook’s vice president of infrastructure engineering, cited Halloween as one of the highest times of year for photo uploading, as my colleague Stacey Higginbotham reported. When Facebook demand spikes, servers stocked with flash memory in the data center instead of hard disk drives and tapes ensure consistently high performance with a wide variety of data — and there’s plenty of room for storage, too. Facebook’s flash-only database servers, codenamed Dragonstone, feature 3.2 terabyte flash memory cards from Fusion-io. Flash memory might come in handy at dating sites’ data centers, too — the Dragonstone flash memory became part of the Open Compute Project last month.

Feature image courtesy of Shutterstock user 3Dstock.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 14, 2013
DataSift open-sources its social media analysis tool
DataSift is releasing an open-source version of its Query Builder service to work alongside enterprises’ existing business-intelligence software, allowing more employees to gain more insight from social media mentions.

The open-source presentation of Query Builder, which permits existing DataSift customers’ developers to simplify the tool’s appearance and functionality, might seem like a matter of crossing big data with even more data. But it’s an important step in trying to prompt business decisions based on what companies can learn about users of Twitter, Facebook and other outlets, not just see what people are saying. Social media analysis becomes more actionable and worthwhile with this sort of functionality.

Plenty of companies ask Twitter to filter out certain parts of its enormous data set. But DataSift is one of just two companies licensed to syndicate the firehose of all Twitter feeds. (The other is Gnip.) Its internet-based Query Builder service also allows customers to run natural-language processing off the entire Twitter firehose and adjust it on the fly in several ways. The processing requires a massive amount of storage, to the tune of 1.3 petabytes, said Nick Halstead, DataSift’s founder and chief technology officer. With the open-source versions, developers can add the Query Builder’s streams and processing to business-intelligence platforms, and users won’t even be able to tell it’s running in the background, Halstead said.

With Query Builder, which was announced in August, users can also pull in Amazon forum messages, YouTube comments, bitly links, Topix posts, Facebook status updates and other social statements, in addition to tweets. The data streams and insight on them all cost subscribing customers $3,000 or more per month. Those users will be able to use open-source versions and get more employees on board.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 14, 2013
Oregon accelerator Upstart gets hip to enterprise startup trend
Upstart Labs, an accelerator, has teamed up with Rogue Venture Partners in a deal that will provide more funding for enterprise-oriented startups. The partnership coincides with a shift for the Portland, Ore.-based accelerator, which has done several consumer-facing deals.

The news falls in line with more interest among investors lately in funding enterprise-facing companies than consumer-facing ones. In November, VC Fred Wilson wrote about the trend, particularly in the context of later-stage investments.

Upstart has worked with a few consumer-facing startups since it launched in 2011, including Chirpify, a mobile social-payment service, and Taplister, a website that shows which beers are on tap at your favorite bar. But beginning this year, Upstart will focus mostly on enterprise, mobile and Software-as-a-Service (SaaS) plays, said Kevin Tate, a general partner in the accelerator.

So far this year, Upstart and Portland-based Rogue Venture Partners have collaborated to fund two companies, one of which, Measureful, is aimed at enterprises. Under the new partnership with Rogue, Upstart typically will be able to offer product development, hands-on mentoring and $100,000 to $250,000 in investments in exchange for equity, Tate said. Previous cash investments have come in smaller amounts.

While they’re not as popular as general accelerators, enterprise-facing accelerators have been popping up as of late. In addition to Upstart — which has attracted startups from the Northwest, Hawaii and Canada — there’s also Acceleprise in Washington, D.C., and Alchemist Accelerator in the San Francisco Bay Area, both of which offer $30,000 in funding. Don’t be surprised to see more come online this year.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 8, 2013
Report: Microserver market will keep rising. Who will be the market leaders?
Shipments of microservers will rise threefold this year, a new report from IHS iSuppli predicts. But before getting too excited, note that that growth only means 291,000 microservers will be shipped.

A microserver uses a bunch of densely-packed, low-power chips. The configuration makes more sense for less demanding compute jobs, such as serving up contact information on one website user, than a server with a more capable brawny core, which tends to use much more power. Webscale companies such as Facebook and Yahoo want to add them to lower their operating costs.

Microserver shipments are going up faster than general servers and blade servers, according to IHS.

And the product sales won’t stop this year. The forecast shows shipments increasing substantially each year until 2016 (see data at left). By then, it will represent one-tenth of overall server shipments.

Still, those normal server shipments are huge; IDC estimates that 8.4 million servers were sold last year. The microserver market, for its part, is clearly still nascent. Nevertheless, the report does give an interesting insight: the microserver trend will only grow, not level out, through 2016.

The report attributes the shipment increase to the need for lower-performance, lower-power chips in the data center and in smartphones.

The billion-dollar question is, Which companies will capture the largest chunks of microserver revenue?

On the processor side, Intel is vying for a sizable cut. In December the company unveiled an Atom-based processor that uses just 6 watts, as my colleague Stacey Higginbotham reported. But last year AMD snapped up SeaMicro, and Rackspace has already certified the new SM15000 — available with Intel Atom, Intel Xeon or AMD Opteron processors — for use in OpenStack.

ARM could stand to gain from the microserver growth, too. In October AMD said it would license ARM’s chip technology to make chips for its own microservers. Plenty of other companies use, or plan to use, ARM’s intellectual property to build chips that could go in microservers, too, including Applied Micro and Calxeda, to name a couple.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
February 6, 2013