LWN.net Logo

LWN.net Weekly Edition for June 6, 2013

An introduction to OpenStack

By Jake Edge
June 5, 2013
CloudOpen Japan 2013

In a CloudOpen Japan talk that included equal parts advocacy and information, Rackspace's Muharem Hrnjadovic looked at OpenStack, one of the entrants in the crowded open source "cloud" software derby. In the "tl;dr" that he helpfully provided, Hrnjadovic posited that "cloud computing is the future" and that OpenStack is the "cloud of the future". He backed those statements up with lots of graphs and statistics, but the more interesting piece was the introduction to what cloud computing is all about, as well as where OpenStack fits in that landscape.

Just fashion?

[Muharem Hrnjadovic]

Is "cloud computing" just a fashion trend, or is it something else, he asked. He believes that it is no mere fashion, but that cloud computing will turn the IT world "upside-down". To illustrate why, he put up a graph from an Amazon presentation that showed how data centers used to be built out. It was a step-wise function as discrete parts of the data center were added to handle new capacity, with each step taking a sizable chunk of capital. Overlaying that graph was the actual demand for the services, which would sometimes be above the build-out (thus losing customers) or below it (thus wasting money on unused capacity). The answer, he said, is elastic capacity and the ability to easily increase or decrease the amount of computation available based on the demand.

There are other reasons driving the adoption of cloud computing, he said. The public cloud today has effectively infinite scale. It is also "pay as you go", so you don't have sink hundreds of thousands of dollars into a data center, you just get a bill at the end of the month. Cloud computing is "self-service" in that one can get a system ready to use without going through the IT department, which can sometimes take a long time.

Spikes in the need for capacity over a short period of time (like for a holiday sale) are good uses of cloud resources, rather than building more data center capacity to handle a one-time (or rare) event. Finally, by automating the process of configuring servers, storage, and the like, a company will become more efficient, so it either needs fewer people or can retrain some of those people to "new tricks". Cloud computing creates a "data center with an API", he said.

OpenStack background

There are lots of reasons to believe that OpenStack is the cloud of the future, Hrnjadovic said. OpenStack has been called the "Linux of the cloud" because it is following the Linux growth path. In just three years, support for OpenStack from companies in the IT sector has "exploded". It was originally started by the US National Aeronautics and Space Administration (NASA) and Rackspace, though NASA eventually withdrew because OpenStack didn't fit with its organizational goals. When that happened, an independent foundation was created to "establish a level playing field". That made OpenStack into a credible project, he said, which helped get more companies on board.

The project is "vibrant", with an active community whose size is "skyrocketing". The graph of the number of contributors to OpenStack shows the classic "hockey stick" shape that is so pleasing to venture capitalists and other investors. Some of those graphs come from this blog post. There were 500+ contributors to the latest "Grizzly" release, which had twice as many changes as the "Essex" release one year earlier. The contributor base is a "huge force", he said; "think of what you could do with 500 developers at your disposal".

Where do these developers come from? Are they hobbyists? No, most of them are earning their paycheck by developing OpenStack, Hrnjadovic said. When companies enter the foundation, they have to provide developers to help with the project, which is part of why the project is progressing so quickly.

Another indication of OpenStack's momentum is the demand for OpenStack skills in the job market. Once again, that graph shows "hockey stick" growth. Beyond that, Google Trends shows that OpenStack has good mindshare, which means that if you want to use OpenStack, you will be able to find answers to your questions, he said.

OpenStack consists of more than 330,000 lines of Python code broken up into multiple components. That includes the Nova compute component, various components for storage (block, image, and object), an identity component for authentication and authorization, a network management component, and a web-based dashboard to configure and control the cloud resources.

There is an incubation process to add new components to OpenStack proper. Two features went through the incubation process in the Grizzly cycle and are now being integrated into OpenStack: Heat, which is an orchestration service to specify and manage multi-tier applications, and Ceilometer, which allows measuring and metering resources. Several other projects (Marconi, Reddwarf, and Moniker) are in various stages of the incubation process now. The project is "developing at a fast clip", Hrnjadovic said.

There are a number of advantages that OpenStack has, he said. It is free, so you don't have to ask anyone to start using it. It is also open source (Apache licensed), so you "can look under the hood". It has a nice community where everyone is welcomed. The project is moving fast, both in squashing bugs and adding features. It is written in Python, which is "much more expressive" than C or Java.

A revolution

"There are some early warning signs that what we have here is a revolution", Hrnjadovic said. Cloud computing is an equalizer that allows individuals or startups to be able to "play the same games" as big companies. Because it has a low barrier to entry, you can "bootstrap a startup on a very low budget". Another sign that there is a revolution underway is that cloud computing is disruptive; the server industry is being upended. He quoted Jim Zemlin's keynote that for every $1 consumed in cloud services, there is $4 not being spent on data centers. Beyond that, there is little or no waiting for cloud servers, unlike physical servers that need to be installed in a data center, which can take some time. Lastly, cloud technologies provide "new possibilities" and allow doing things "you couldn't do otherwise".

In the face of a revolution, "you want to be on the winning side". Obviously, Hrnjadovic thinks that is OpenStack, but many of his arguments in the talk could equally apply to other open source cloud choices (Eucalyptus, CloudStack, OpenNebula, ...).

These days, everything is scaling horizontally (out) rather than vertically (up), because it is too expensive to keep upgrading to more and more powerful servers. So, people are throwing "gazillions" of machines—virtual machine instances, bare metal, "whatever"—at the problems. That many machines requires automation, he said. You can take care of five machines without automating things, but you can't handle 5000 machines that way.

Scaling out also implies "no more snowflakes". That means there are no special setups for servers, they are all stamped out the same. An analogy he has heard is that it is the difference between pets and cattle. If a pet gets injured, you take them to the veterinarian to get them fixed, but if one of a herd of cattle is injured, you "slaughter it brutally and move on". That's just what you do with a broken server in the cloud scenario; it "sounds brutal" but is the right approach.

Meanwhile, by picking OpenStack, you can learn about creating applications on an "industrial strength" operating system like Linux, learn how to automate repetitive tasks with Chef or puppet, and pick up a bit of Python programming along the way. It is a versatile system that can be installed on anything from laptops to servers and can be deployed as a public or private cloud. Hybrid clouds are possible as well, where the base demand is handled by a private cloud and any overage in demand is sent to the public cloud; a recent slogan he has heard: "own the base and rent the spike".

Hrnjadovic finished with an example of "crazy stuff" that can be done with OpenStack. A German company called AoTerra is selling home heating systems that actually consist of servers running OpenStack. It is, in effect, a distributed OpenStack cloud that uses its waste heat to affordably heat homes. AoTerra was able to raise €750,000 via crowd funding to create one of the biggest OpenStack clouds in Germany—and sell a few heaters in the deal.

He closed by encouraging everyone to "play with" OpenStack. Developers, users, and administrators would all be doing themselves a service by looking at it.

[I would like to thank the Linux Foundation for travel assistance to Tokyo for CloudOpen Japan.]

Comments (33 posted)

Trusting upstream

By Jake Edge
June 4, 2013
LinuxCon Japan 2013

When one is trying to determine if there are compliance problems in a body of source code—either code from a device maker or from someone in the supply chain for a device—the sheer number of files to consider can be a difficult hurdle. A simple technique can reduce the search space significantly, though it does require a bit of a "leap of faith", according to Armijn Hemel. He presented his technique, along with a case study and a war story or two at LinuxCon Japan.

[Armijn Hemel]

Hemel was a longtime core contributor to the gpl-violations.org project before retiring to a volunteer role. He is currently using his compliance background in his own company, Tjaldur Software Governance Solutions, where he consults with clients on license compliance issues. Hemel and Shane Coughlan also created the Binary Analysis Tool (BAT) to look inside binary blobs for possible compliance problems.

Consumer electronics

There are numerous license problems in today's consumer electronics market, Hemel said. There are many products containing GPL code with no corresponding source code release. Beyond that, there are products with only a partial release of the source code, as well as products that release the wrong source code. He mentioned a MIPS-based device that provided kernel source with a configuration file that chose the ARM architecture. There is no way that code could have run on the device using that configuration, he said.

That has led to quite a few cases of license enforcement in various countries, particularly Germany, France, and the US. There have been many cases handled by gpl-violations.org in Germany, most of which were settled out of court. Some went to court and the copyright holders were always able to get a judgment upholding the GPL. In the US, it is the Free Software Foundation, Software Freedom Law Center, and Software Freedom Conservancy that have been handling the GPL enforcement.

The origin of the license issues in the consumer electronics space is the supply chain. This chain can be quite long, he said; one he was involved in was four or five layers deep and he may not have reached the end of it. Things can go wrong at each step in the supply chain as software gets added, removed, and changed. Original design manufacturers (ODMs) and chipset vendors are notoriously sloppy, though chipset makers are slowly getting better.

Because it is a "winner takes all" market, there is tremendous pressure to be faster than the competition in supplying parts for devices. If a vendor in the supply chain can deliver a few days earlier than its competitors at the same price point, it can dominate. That leads to companies cutting corners. Some do not know they are violating licenses, but others do not care that they are, he said. Their competition is doing the same thing and there is a low chance of getting caught, so there is little incentive to actually comply with the licenses of the software they distribute.

Amount of code

Device makers get lots of code from all the different levels of the supply chain and they need to be able to determine whether the licenses on that code are being followed. While business relationships should be based on trust, Hemel said, it is also important to verify the code that is released with an incorporated part. Unfortunately, the number of files being distributed can make that difficult. If a company receives a letter from a lawyer requesting a response or fix in two weeks, for example, the sheer number of files might make that impossible to do.

For example, BusyBox, which is often distributed with embedded systems, is made up of 1700 files. The kernel used by Android has increased from 30,000 (Android 2.2 "Froyo") to 36,000 (Android 4.1 "Jelly Bean")—and the 3.8.4 kernel has 41,000 files. Qt 5 is 62,000 files. Those are just some of the components on a device, when you add it all up, an Android system consists of "millions of files in total", he said. The lines of code in just the C source files is similarly eye-opening, with 255,000 lines in BusyBox and 12 million in the 3.8.4 kernel.

At LinuxCon Europe in 2011, the long-term support initiative was announced. As part of that, the Yaminabe project to detect duplicate work in the kernel was also introduced. That project focused on the changes that various companies were making to the kernel, so it ignored all files that were unchanged from the upstream kernel sources as "uninteresting". It found that 95% of the source code going into Android handsets was unchanged. Hemel realized that the same technique could be applied to make compliance auditing easier.

Hemel's method starts with a simple assumption: everything that an upstream project has published is safe, at least from a compliance point of view. Compliance audits should focus on those files that aren't from an upstream distribution. This is not a mechanism to find code snippets that have been copied into the source (and might be dubious, license-wise), as there are clone detectors for that purpose. His method can be used as a first-level pre-filter, though.

Why trust upstream?

Trusting the upstream projects can be a little bit questionable from a license compliance perspective. Not all of them are diligent about the license on each and every file they distribute. But the project members (or the project itself) are the copyright holders and the project chose its license. That means that only the project or its contributors can sue for copyright infringement, which is something they are unlikely to do on files they distributed.

Most upstream code is used largely unmodified, so using upstream projects as a reference makes sense, but you have to choose which upstreams to trust. For example, the Linux kernel is a "high trust" upstream, Hemel said, because of its development methodology, including the developer's certificate of origin and the "Signed-off-by" lines that accompany patches. There is still some kernel code that is licensed as GPLv1-only, but there is "no chance" you will get sued by Linus Torvalds, Ted Ts'o, or other early kernel developers over its use, he said.

BusyBox is another high trust project as it has been the subject of various highly visible court cases over the years, so any license oddities have been shaken out. Any code from the GNU project is also code that he treats as safe.

On the other hand, projects like the Maven build tool central repository for Java are an example of a low or no trust upstream. Maven is an "absolute mess" that has become a dumping ground for Java code, with unclear copyrights, unclear code origins, and so on. Hemel "cannot even describe how bad" the Maven code base central repository is; it is a "copyright time bomb waiting to explode", he said.

For his own purposes, Hemel chooses to put a lot of trust in upstreams like Samba, GNOME, or KDE, while not putting much in projects that pull in a lot of upstream code, like OpenWRT, Fedora, or Debian. The latter two are quite diligent about the origin and licenses of the code they distribute, but he conservatively chooses to trust upstream projects directly, rather than projects that collect code from many other different projects.

Approach

So, his approach is simple and straightforward: generate a database of source code file checksums (really, SHA256 hashes) from upstream projects. When faced with a large body of code with unknown origins, the SHA256 of the files is computed and compared to the database. Any that are in the database can be ignored, while those that don't match can be analyzed or further scanned.

In terms of reducing the search space, the method is "extremely effective", Hemel said. It takes about ten minutes for a scan of a recent kernel, which includes running Ninka and FOSSology on source files that do not match the hashes in the database. Typically, he finds that only 5-10% of files are modified, so the search space is quickly reduced by 90% or more.

There are some caveats. Using the technique requires a "leap of faith" that the upstream is doing things well and not every upstream is worth trusting. A good database that contains multiple upstream versions is time consuming to create and to keep up to date. In addition, it cannot help with non-source-related compliance problems (e.g. configuration files). But it is a good tool to help prioritize auditing efforts, even if the upstreams are not treated as trusted. He has used the technique for Open Source Automation Development Lab (OSADL) audits and for other customers with great success.

Case study

Hemel presented something of a case study that looked at the code on a Linux-based router made by a "well-known Chinese router manufacturer". The wireless chip came from well-known chipset vendor as well. He looked at three components of the router: the Linux kernel, BusyBox, and the U-Boot bootloader.

The kernel source had around 25,000 files, of which just over 900 (or 4%) were not found in any kernel.org kernel version. 600 of those turned out to be just changes made by the version control system (CVS/RCS/Perforce version numbers, IDs, and the like). Some of what was left were proprietary files from the chipset or device manufacturers. Overall, just 300 files (1.8%) were left to look at more closely.

For BusyBox, there were 442 files and just 62 (14%) that were not in the database. The changed files were mostly just version control identifiers (17 files), device/chipset files, a modified copy of bridge-utils, and a few bug fixes.

The situation was much the same for U-Boot: 2989 files scanned with 395 (13%) not in the database. Most of those files were either chipset vendor files or ones with Perforce changes, but there were several with different licenses than the GPL (which is what U-Boot uses). But there is also a file with the text: "Permission granted for non-commercial use"—not something that the router could claim. As it turned out, the file was just present in the U-Boot directory and was not used in the binary built for the device.

Scripts to create the database are available in BAT version 14, a basic scanning script is coming in BAT 15 but is already available in the Subversion repository for the project. Fancier tools are available to Hemel's clients, he said. One obvious opportunity for collaboration, which did not come up in the talk, would be to collectively create and maintain a database of hash values for high-profile projects.

How to convince the legal department that this is a valid approach was the subject of some discussion at the end of the talk. It is a problem, Hemel said, because legal teams may not feel confident about the technique even though it is a "no brainer" for developers. Another audience member suggested that giving examples of others who have successfully used the technique is often the best way to make the lawyers comfortable with it. Also, legal calls, where lawyers can discuss the problem and possible solutions with other lawyers who have already been down that path, can be valuable.

Working with the upstream projects to clarify any licensing ambiguities is also useful. It can be tricky to get those projects to fix files with an unclear license, especially when the project's intent is clear. In many ways, "git pull" (and similar commands) have made it much easier to pull in code from third-party projects, but sometimes that adds complexity on the legal side. That is something that can be overcome with education and working with those third-party projects.

[I would like to thank the Linux Foundation for travel assistance to Tokyo for LinuxCon Japan.]

Comments (6 posted)

Diversity and recruiting developers

By Nathan Willis
June 5, 2013
Texas Linux Fest 2013

At Texas Linux Fest 2013 in Austin, Rikki Endsley from the USENIX Association spoke about a familiar topic—diversity in technology companies and software projects—but from a different angle. Specifically, she looked at how companies recruit new team members, and the sorts of details than can unintentionally keep applicants away. Similarly, there are practices that companies can engage in to help them retain more of their new hires, particularly those that come from a different background than their co-workers.

[Rikki Endsley at TXLF]

A lot of what Endsley said was couched in terms of "hiring," but she said that it applies to recruiting volunteers to open source projects as well. As most people are aware, demographic diversity in technical fields is lower than in the population at large, she said, and it is particularly low in free software projects. Of course, these days paid employees do a large share of the work on free software projects; for companies that manage or produce open source code, the diversity problem is indeed one of finding, hiring, and retaining people.

Everyone understands the value of hiring a diverse team, Endsley said, but a fairly common refrain in technology circles is "we don't have any women on our team because none applied." Obviously there are women out there, she noted, the challenge is just to make sure that they know about one's company and its job opportunities. This can be a problem in any scientific and engineering field, she said, but it is particularly troublesome in open source, where the demand for developers already exceeds the supply. In a job-seeker's market, companies need to "sell" their company to the employee, not vice-versa, so if your company is not getting the applicants it would like to see, you ought to look closely at how you sell yourself, and be adaptable.

Endsley said that she did not have all of the answers to how to recruit more diverse applicants, but she did at least have a number of things that a concerned company could try. Most of her observations dealt directly with recruiting women, but she said that the advice applied in general to other demographics as well. She offered examples that addressed other diversity angles, including ethnicity and age.

The hunt

Recruiting really begins with identifying what a company needs, she said. It is tempting to come up with a terse notion of what the new recruit will do (e.g., "a Python programmer"), but it is better to consider other facets of the job: representing the company at events, helping to manage teams and projects, etc. The best plan, though, is to come up with not one, but three or four "talent profiles," then go out and change recruiting practices to find the people that fit.

Where one looks for new talent is important. Not everyone who meets the talent profile is reading job board sites like Monster.com. Companies can find larger and more diverse pools of potential talent at events like trade shows and through meetups or personal networking groups. In short, "think about where people engage" and go there. After all, not everyone that you might want to hire is out actively looking for a job. It can also help to reach out on social networks (where, Endsley noted, it is the "word of mouth" by other people spreading news that your company is hiring that offers the real value) and to create internship programs.

Apart from broadening the scope of the search, Endsley said that a company's branding can greatly influence who responds to job ads. Many startups, she said, put a lot of emphasis on the corporate culture—particularly being the "hip" place to work and having games and a keg in the break room. But that image only appeals to a narrow slice of potential recruits. What comes across as hip today is only likely to appeal to Millennials, not to those in Generation X or earlier. In contrast, she showed Google's recruiting slogan, "Do cool things that matter." It is simple and, she said, "who doesn't want to do cool things that matter?"

Companies should also reconsider the criteria that they post for their open positions, she said. She surveyed a number of contacts in the technology sector and asked them what words they found to be a turn-off in job ads. On the list of negatives were "rock star," "ninja," "expert," and "top-notch performer." The slang terms again appeal only to a narrow age range, while the survey respondents said all of them suggest an atmosphere where "all my colleagues will be arrogant jerks." Similarly, the buzzwords "fast-paced and dynamic" were often interpreted to mean "total lack of work/life balance and thoughtless changes in direction." The term "passionate" suggested coworkers likely to lack professionalism and argue loudly, while the phrase "high achievers reap great rewards" suggested back-stabbing coworkers ready to throw you under the bus to get ahead.

Endsley showed a number of real-world job ads (with the names of the companies removed, of course) to punctuate these points. There were many that used the term "guys" generically or "guys and gals", which she said would not turn off all female applicants, but would reasonably turn off quite a few. There were plenty of laughably bad examples, including one ad that devoted an entire paragraph to advertising the company's existing diversity—but did so by highlighting various employees' interests in fishing, motorcycle-racing, and competitive beard-growing. Another extolled the excitement of long work days "in a data center with a rowdy bunch of guys." Honestly, Endsley observed, "that's really not even going to appeal to many other guys."

Onboarding and retention

After successfully recruiting an employee, she said, there is still "onboarding" work required to get the new hire adjusted to the company, engaged in the job, and excited about the work. Too often, day one involves handing the new hire an assignment and walking away. That is detrimental because research shows that most new hires decide within a matter of months whether or not they want to stay with a company long term (although Endsley commented that in the past she has decided within a few hours that a new company was not for her).

She offered several strategies to get new hires acclimated and connected early. One is to rotate the new employee through the whole company a few days or weeks at a time before settling into a permanent team. This is particularly helpful for a new hire who is in the minority at the office; for instance, the sole female engineer on a team would get to meet other women in other teams that she otherwise might not get to know at all. Building those connections makes the new hire more likely to stay engaged. It is also helpful to get the new hire connected to external networks, such as going to conferences or engaging in meetups.

Retaining employees is always an area of concern, and Endsley shared several strategies for making sure recent hires are happy—because once an at-risk employee is upset, the chances are much higher that the company has already lost the retention battle. One idea is to conduct periodic motivation checks; for example, in the past USENIX has asked her what it would take for her to leave for another job. Checks like these need to be done more than once, she noted, since the factors that determine whether an employee stays or leaves change naturally over time. Companies can also do things to highlight the diversity of their existing employees, she said; Google is again a good example of doing this kind of thing right, holding on-campus activities and events to celebrate different employees' backgrounds, and cultivating meetup and interest groups.

Another important strategy is to have a clear and fair reward system in place. No one likes finding out that a coworker makes more money for doing the same work solely because they negotiated differently during the interview. And it is important that there be clear ways to advance in the company. If developers cannot advance without shifting into management, they may not want to stay. Again, most of these points are valuable for all employees, but their impact can be greater on an employee who is in the minority—factors like "impostor syndrome" (that is, the feeling that everyone else in the group is more qualified and will judge you negatively) can be a bigger issue for an employee who is already the only female member of the work group.

The audience asked quite a few questions at the end of the session. One was from a man who had concerns that hiring for diversity can come across as hiring a token member of some demographic group. Endsley agreed that it can certainly be interpreted that way—if done wrong. But her point was not to give advice to someone who would think "I need two more women on my team," but to someone who is interested in hiring from a diverse pool of applicants. That is, someone who says "I have no women on my team, and none are applying; what am I doing wrong?" Most people these days seem to agree on the benefits of having a diverse team, but most people still have plenty of blind spots that can be improved upon. But with demand for developers working on open source code exceeding supply, successfully reaching the widest range of possible contributors is a wise move anyway.

Comments (15 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Security: Smack for Tizen; New vulnerabilities in kernel, mesa, wireshark, xmp, ...
  • Kernel: The multiqueue block layer; Reliable user-space OOM handling; Power-aware scheduling.
  • Distributions: Debian, Iceweasel, and security; Arch, Debian, Fedora, RHEL, ...
  • Development: Mobile health initiatives; PulseAudio 4.0; PyTables 3.0; Source code CSS; ...
  • Announcements: RIP Atul Chitnis, events, ...
Next page: Security>>

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds