In a CloudOpen Japan talk that included equal parts advocacy and information,
Rackspace's Muharem Hrnjadovic looked at OpenStack, one of the entrants in the
crowded open source "cloud" software derby. In the "tl;dr" that he
helpfully provided, Hrnjadovic posited that "cloud computing is the future" and
that OpenStack is the "cloud of the future". He backed those statements up
with lots of graphs and statistics, but the more interesting piece was the
introduction to what cloud computing is all about, as well as where
OpenStack fits in that landscape.
Just fashion?
Is "cloud computing" just a fashion trend, or is it something else, he
asked. He believes that it is no mere fashion, but that cloud computing
will turn the IT world "upside-down". To illustrate why, he put up a graph
from an Amazon presentation that showed how data centers used to be built
out. It was a step-wise function as discrete parts of the data center were
added to handle new capacity, with each step taking a sizable chunk of
capital. Overlaying that graph was the actual demand for the services, which
would sometimes be above the build-out (thus losing customers) or below it
(thus wasting money on unused capacity). The answer, he said, is elastic
capacity and the ability to easily increase or decrease the amount of
computation available based on the demand.
There are other reasons driving the adoption of cloud computing, he said.
The public cloud today has effectively infinite scale. It is also "pay as
you go", so you don't have sink hundreds of thousands of dollars into a
data center, you just get a bill at the end of the month. Cloud computing
is "self-service" in that one can get a system ready to use without going
through the IT department, which can sometimes take a long time.
Spikes in the need for capacity over a short period of time (like for a
holiday sale) are good uses of cloud resources, rather than building more
data center capacity to handle a one-time (or rare) event. Finally, by
automating the process of configuring servers, storage, and the like, a
company will become more efficient, so it either needs fewer people or can
retrain some of those people to "new tricks". Cloud computing creates a
"data center
with an API", he said.
OpenStack background
There are lots of reasons to believe that OpenStack is the cloud of the
future, Hrnjadovic said. OpenStack has been called the "Linux of the cloud"
because it is following the Linux growth path. In just three years,
support for OpenStack from companies in the IT sector has "exploded". It
was originally started by the US
National Aeronautics and Space Administration (NASA) and Rackspace, though
NASA eventually withdrew because OpenStack didn't fit with its organizational
goals. When that happened, an independent foundation was created to
"establish a level playing field". That made OpenStack into a credible
project, he said, which helped get more companies on board.
The project is "vibrant", with an active community whose size is
"skyrocketing". The graph of the number of contributors to OpenStack shows
the
classic "hockey stick" shape that is so pleasing to venture capitalists and
other investors. Some of those graphs come from this blog post. There were 500+
contributors to the latest "Grizzly"
release, which had twice as many changes as the "Essex" release one
year earlier. The contributor base is a "huge force", he said; "think of
what you could do with 500 developers at your disposal".
Where do these developers come from? Are they hobbyists? No, most of them
are earning their paycheck by developing OpenStack, Hrnjadovic said. When
companies enter the foundation, they have to provide developers to help
with the project, which is part of why the project is progressing so quickly.
Another indication of OpenStack's momentum is the demand for OpenStack
skills in the job market. Once again, that graph shows "hockey stick"
growth. Beyond that, Google Trends shows that OpenStack has good
mindshare, which means that if you want to use OpenStack, you will be able
to find answers to your
questions, he said.
OpenStack consists of more than 330,000 lines of Python code broken up into
multiple components. That includes the Nova compute component, various
components for storage
(block, image, and object), an identity component
for authentication and authorization, a network
management component, and a web-based
dashboard to configure and control the cloud resources.
There is an incubation process to add new components to OpenStack proper.
Two features went through the incubation process in the Grizzly cycle and
are now being integrated into OpenStack: Heat,
which is an orchestration service to specify and manage multi-tier
applications, and Ceilometer, which
allows measuring and metering resources. Several other projects (Marconi, Reddwarf, and Moniker) are in various
stages of the incubation process now. The project is "developing at a fast
clip", Hrnjadovic said.
There are a number of advantages that OpenStack has, he said. It is free,
so you don't have to ask anyone to start using it. It is also open source
(Apache licensed), so you "can look under the hood". It has a nice
community where everyone is welcomed. The project is moving fast, both in
squashing bugs and adding features. It is written in Python, which is "much
more expressive" than C or Java.
A revolution
"There are some early warning signs that what we have here is a revolution",
Hrnjadovic said. Cloud computing is an equalizer that allows individuals or startups
to be able to "play the same games" as big companies. Because it has a low
barrier to entry, you can "bootstrap a startup on a very low budget".
Another sign that there is a revolution underway is that cloud computing is
disruptive; the server industry is being upended. He quoted Jim Zemlin's
keynote that for every $1 consumed in cloud services, there is $4 not being
spent on data centers. Beyond that, there is little or no waiting for
cloud servers, unlike physical servers that need to be installed in a data
center, which can take some time. Lastly, cloud technologies provide "new
possibilities" and allow doing things "you couldn't do otherwise".
In the face of a revolution, "you want to be on the winning side".
Obviously, Hrnjadovic thinks that is OpenStack, but many of his arguments in the
talk could equally apply to other open source cloud choices (Eucalyptus,
CloudStack, OpenNebula, ...).
These days, everything is scaling
horizontally (out) rather than vertically (up), because it is too expensive
to keep upgrading to more and more powerful servers. So, people are
throwing "gazillions" of machines—virtual machine instances, bare
metal, "whatever"—at the problems.
That many machines requires automation, he said. You can take care of five
machines without automating things, but you can't handle 5000 machines that
way.
Scaling out also implies "no more snowflakes". That means there are no special
setups for servers, they are all stamped out the same. An analogy he has
heard is that it is the difference between pets
and cattle. If a pet gets injured, you take them to the veterinarian to get
them fixed, but if one of a herd of cattle is injured, you "slaughter it
brutally and move on". That's just what you do with a broken server in the
cloud scenario; it "sounds brutal" but is the right approach.
Meanwhile, by picking OpenStack, you can learn about creating applications
on an "industrial
strength" operating system like Linux, learn how to automate repetitive
tasks with Chef or puppet, and pick up a bit of Python programming along
the way. It is a versatile system that can be installed on anything from
laptops to servers and can be deployed as a public or private cloud.
Hybrid clouds are possible as well, where the base demand is handled by a
private cloud and any overage in demand is sent to the public cloud; a
recent slogan he has heard: "own the base and rent the spike".
Hrnjadovic finished with an example of "crazy stuff" that can be done with
OpenStack. A German company called AoTerra is selling
home heating systems that actually consist of servers running
OpenStack. It is, in effect, a distributed OpenStack cloud that uses its
waste heat to affordably heat homes. AoTerra was able to raise €750,000 via
crowd funding to create one of the biggest OpenStack clouds in Germany—and
sell a few heaters in the deal.
He closed by encouraging everyone to "play with" OpenStack. Developers,
users, and administrators would all be doing themselves a service by
looking at it.
[I would like to thank the Linux Foundation for travel assistance to Tokyo
for CloudOpen Japan.]
Comments (33 posted)
When one is trying to determine if there are compliance problems in a body
of
source code—either code from a device maker or from someone in the supply chain
for a device—the sheer number of files to consider can be a difficult
hurdle. A simple technique can reduce the search space
significantly, though it does require a bit of a "leap of faith", according
to Armijn Hemel. He presented his technique, along with a
case study and a war story or two at LinuxCon Japan.
Hemel was a longtime core contributor to the gpl-violations.org project before retiring
to a volunteer role. He is currently using his compliance background
in his own company, Tjaldur
Software Governance Solutions, where he consults with clients on
license compliance issues. Hemel and Shane Coughlan also created the Binary Analysis Tool (BAT)
to look inside binary blobs
for possible compliance problems.
Consumer electronics
There are numerous license problems in today's consumer electronics market,
Hemel said. There are many products containing GPL code with no
corresponding
source code release. Beyond that, there are products with only a partial
release of the source code, as well as products that release the wrong
source code. He mentioned a MIPS-based device that provided kernel source
with a configuration file that chose the ARM architecture. There is no way
that code could have run on the device using that configuration, he said.
That has led to quite a few cases of license enforcement in various
countries, particularly Germany, France, and the US. There have been many
cases handled by gpl-violations.org in Germany, most of which were settled
out of court. Some went to court and the copyright holders were always
able to get a judgment upholding the GPL. In the US, it is the Free
Software Foundation, Software
Freedom Law Center, and Software Freedom
Conservancy that have been
handling the GPL enforcement.
The origin of the license issues in the consumer electronics space is the
supply chain. This chain can be quite long, he said; one he was involved
in was four or five layers deep and he may not have reached the end of it.
Things can go wrong at each step in the supply chain as software gets
added, removed, and changed. Original design manufacturers (ODMs) and
chipset vendors are notoriously sloppy, though chipset makers are slowly
getting better.
Because it is a "winner takes all" market, there is tremendous pressure to
be faster than the competition in supplying parts for devices. If a vendor
in the supply chain can deliver a few days earlier than its competitors at
the same price point, it can dominate. That leads to companies cutting
corners. Some do not know they are violating licenses, but others do not
care that they are, he said. Their competition is doing the same thing and
there is a low chance of getting caught, so there is little incentive to
actually comply with the licenses of the software they distribute.
Amount of code
Device makers get lots of code from all the different levels of
the supply chain and they need to be able to determine whether the licenses
on that code are being followed.
While business relationships should be based on trust, Hemel said, it is
also important to verify the code that is released with an incorporated
part. Unfortunately, the number of files being distributed can make that
difficult.
If a company receives a letter from a lawyer requesting a
response or fix
in two weeks, for example, the sheer number of files might make that
impossible to do.
For example, BusyBox, which is often distributed with embedded systems, is
made up of 1700 files. The kernel used by Android has increased
from 30,000 (Android 2.2 "Froyo") to 36,000 (Android 4.1 "Jelly
Bean")—and the 3.8.4 kernel has
41,000 files. Qt 5 is 62,000 files. Those are just some of the
components on a device, when you add it all up, an
Android system consists of "millions of files in total", he said. The
lines of code in just the C source files is similarly eye-opening, with
255,000 lines in BusyBox and 12 million in the 3.8.4 kernel.
At LinuxCon Europe in 2011, the
long-term support initiative was
announced. As part of that, the Yaminabe
project to detect duplicate work in the kernel was also introduced.
That project focused on the changes that various companies were making to
the kernel, so it ignored all files that were unchanged from the upstream
kernel sources as "uninteresting". It found that 95% of the source code
going into Android handsets was unchanged. Hemel realized that the same
technique could be applied to make compliance auditing easier.
Hemel's method starts with a simple assumption: everything that an upstream
project has published is safe, at least from a compliance point of view.
Compliance audits should focus on those files that aren't from an
upstream distribution. This is not a mechanism to find code snippets that
have been copied into the source (and might be dubious, license-wise),
as there are clone detectors for that purpose. His method can be used as a
first-level pre-filter, though.
Why trust upstream?
Trusting the upstream projects can be a little bit questionable from a license
compliance perspective. Not all of them are diligent about the license on
each and every file they distribute. But the project members (or the
project itself) are the copyright holders and the project chose its
license. That means that only the project or its contributors can sue for
copyright infringement, which is something they are unlikely to do on files
they distributed.
Most upstream code is used largely unmodified, so using upstream projects
as a reference makes sense, but you have to choose which upstreams
to trust. For example, the Linux kernel is a "high trust" upstream, Hemel
said, because of its development methodology, including the developer's
certificate of origin and the "Signed-off-by" lines that accompany
patches. There is still some kernel code that is licensed as GPLv1-only,
but there is "no chance" you will get sued by Linus Torvalds, Ted Ts'o, or
other early kernel developers
over its use, he said.
BusyBox is another high trust project as it has been the subject of various
highly visible court cases over the years, so any license oddities have
been shaken out. Any code from the GNU project is also code that he treats
as safe.
On the other hand, projects like the Maven build tool central repository for Java are an
example of a low or no trust upstream. Maven is an "absolute mess" that
has become a dumping ground for Java code, with unclear copyrights, unclear
code origins, and so on. Hemel "cannot even describe how bad" the Maven
code base central repository is; it is a "copyright time bomb waiting to explode", he said.
For his own purposes, Hemel chooses to put a lot of trust in upstreams like
Samba, GNOME, or KDE, while not putting much in projects that pull in a
lot of upstream code, like OpenWRT, Fedora, or Debian. The latter two are
quite diligent about the origin and licenses of the code they distribute, but he
conservatively chooses to trust upstream projects directly, rather than
projects that collect code from many other different projects.
Approach
So, his approach is simple and straightforward: generate a database of
source code file checksums (really, SHA256 hashes) from upstream projects.
When faced with a large body of code with unknown origins, the SHA256 of
the files is computed and compared to the database. Any that are in the
database can be ignored, while those that don't match can be analyzed or further
scanned.
In terms of reducing the search space, the method is "extremely effective",
Hemel said. It takes about ten minutes for a scan of a recent kernel, which
includes running Ninka and FOSSology on source
files that do not match the hashes in the database. Typically, he finds that
only 5-10% of files are modified, so the search space is quickly reduced by
90% or more.
There are some caveats.
Using the technique requires a "leap of faith" that the upstream is doing
things well
and not every upstream is worth trusting. A good database that contains
multiple upstream versions is time consuming to create and to keep up to
date. In addition, it cannot help with non-source-related compliance
problems (e.g. configuration files). But it is a good tool to help prioritize
auditing efforts, even if the upstreams are not treated as trusted.
He has used the technique for Open
Source Automation Development Lab (OSADL) audits and for other
customers with great success.
Case study
Hemel presented something of a case study that looked at the code on a
Linux-based router made by a "well-known Chinese router manufacturer". The
wireless chip came from well-known chipset vendor as well. He looked at
three components of the router: the Linux kernel, BusyBox, and the U-Boot
bootloader.
The kernel source had around 25,000 files, of which just over 900 (or 4%)
were not found in any kernel.org kernel version. 600 of those turned out
to be just changes made by the version control system (CVS/RCS/Perforce
version numbers, IDs, and the like). Some of what was left were
proprietary files from the chipset or device manufacturers. Overall, just
300 files (1.8%) were left to look at
more closely.
For BusyBox, there were 442 files and just 62 (14%) that were not in the
database. The changed files were mostly just version control identifiers
(17 files), device/chipset files, a modified copy of bridge-utils, and a
few bug fixes.
The situation was much the same for U-Boot: 2989 files scanned with 395
(13%) not in the database. Most of those files were either chipset vendor
files or ones with Perforce changes, but there were several with different
licenses than the GPL (which is what U-Boot uses). But there is also a
file with the text: "Permission
granted for non-commercial use"—not something that the router could
claim. As it turned out, the file was just present in the U-Boot directory
and was not used in the binary built for the device.
Scripts to create the database are available in BAT version
14, a basic scanning script is coming in BAT 15 but is already
available in the Subversion
repository for the project. Fancier tools are available to Hemel's
clients, he said. One obvious opportunity for collaboration, which did not
come up in the talk, would be to collectively create and maintain a
database of hash values for high-profile projects.
How to convince the legal department that this is a valid approach was the
subject of some discussion at the end of the talk. It is a problem, Hemel
said, because legal teams may not feel confident about the technique even
though it is a "no brainer" for developers. Another audience member suggested
that giving examples of others who have successfully used the technique is
often the
best way to make the lawyers comfortable with it. Also, legal calls, where
lawyers can discuss the problem and possible solutions with other lawyers
who have already been down that path, can be valuable.
Working with the upstream projects to clarify any licensing ambiguities is
also useful. It can be tricky to get those projects to fix files with an
unclear license, especially
when the project's intent is clear. In many ways, "git pull"
(and similar commands) have made it much easier to pull in code from
third-party projects, but sometimes that adds complexity on the legal side.
That is something that can be overcome with education and working with
those third-party projects.
[I would like to thank the Linux Foundation for travel assistance to Tokyo
for LinuxCon Japan.]
Comments (6 posted)
At Texas Linux Fest 2013 in Austin, Rikki Endsley from the USENIX
Association spoke about a familiar topic—diversity in
technology companies and software projects—but from a different
angle. Specifically, she looked at how companies recruit new team
members, and the sorts of details than can unintentionally keep
applicants away. Similarly, there are practices that companies can
engage in to help them retain more of their new hires, particularly
those that come from a different background than their co-workers.
A lot of what Endsley said was couched in terms of "hiring," but
she said that it applies to recruiting volunteers to open source
projects as well. As most people are aware, demographic diversity in
technical fields is lower than in the population at large, she said,
and it is particularly low in free software projects. Of course,
these days paid employees do a large share of the work on free
software projects; for companies that manage or produce open source
code, the diversity problem is indeed one of finding, hiring, and
retaining people.
Everyone understands the value of hiring a diverse team, Endsley
said, but a fairly common refrain in technology circles is "we don't
have any women on our team because none applied." Obviously there are
women out there, she noted, the challenge is just to make sure that
they know about one's company and its job opportunities. This can be a
problem in any scientific and engineering field, she said, but it is
particularly troublesome in open source, where the demand for
developers already exceeds the supply. In a job-seeker's market,
companies need to "sell" their company to the employee, not
vice-versa, so if your company is not getting the applicants it would
like to see, you ought to look closely at how you sell yourself, and
be adaptable.
Endsley said that she did not have all of the answers to how to
recruit more diverse applicants, but she did at least have a number of
things that a concerned company could try. Most of her observations
dealt directly with recruiting women, but she said that the advice
applied in general to other demographics as well. She offered
examples that addressed other diversity angles, including ethnicity
and age.
The hunt
Recruiting really begins with identifying what a company needs, she
said. It is tempting to come up with a terse notion of what the new
recruit will do (e.g., "a Python programmer"), but it is better to
consider other facets of the job: representing the company at events,
helping to manage teams and projects, etc. The best plan, though, is
to come up with not one, but three or four "talent profiles," then go
out and change recruiting practices to find the people that fit.
Where one looks for new talent is important. Not everyone who
meets the talent profile is reading job board sites like Monster.com.
Companies can find larger and more diverse pools of potential talent
at events like trade shows and through meetups or personal networking
groups.
In short, "think about where people engage" and go there. After all,
not everyone that you might want to hire is out actively looking for a
job. It can also help to reach out on social networks (where, Endsley
noted, it is the "word of mouth" by other people spreading
news that your company is hiring that offers the real value) and to
create internship programs.
Apart from broadening the scope of the search, Endsley said that a
company's branding can greatly influence who responds to job ads.
Many startups, she said, put a lot of emphasis on the corporate
culture—particularly being the "hip" place to work and having games
and a keg in the break room. But that image only appeals to a narrow
slice of potential recruits. What comes across as hip today is only
likely to appeal to Millennials, not
to those in Generation X or
earlier. In contrast, she showed Google's recruiting slogan, "Do cool
things that matter." It is simple and, she said, "who doesn't want to
do cool things that matter?"
Companies should also reconsider the criteria that they post for
their open positions, she said. She surveyed a number of contacts
in the technology sector and asked them what words they found to be a
turn-off in job ads. On the list of negatives were "rock star,"
"ninja," "expert," and "top-notch performer." The slang terms again
appeal only to a narrow age range, while the survey respondents said
all of them suggest an atmosphere where "all my colleagues will be
arrogant jerks." Similarly, the buzzwords "fast-paced and dynamic"
were often interpreted to mean "total lack of work/life balance and
thoughtless changes in direction." The term "passionate" suggested
coworkers likely to lack professionalism and argue loudly, while the
phrase "high achievers reap great rewards" suggested back-stabbing
coworkers ready to throw you under the bus to get ahead.
Endsley showed a number of real-world job ads (with the names of
the companies removed, of course) to punctuate these points. There
were many that used the term "guys" generically or "guys and gals", which
she said would not turn off all female applicants, but would
reasonably turn off quite a few. There were plenty of laughably bad
examples, including one ad that devoted an entire paragraph to
advertising the company's existing diversity—but did so by
highlighting various employees' interests in fishing, motorcycle-racing,
and competitive beard-growing. Another extolled the excitement of
long work days "in a data center with a rowdy bunch of guys."
Honestly, Endsley observed, "that's really not even going to appeal
to many other guys."
Onboarding and retention
After successfully recruiting an employee, she said, there is still
"onboarding" work required to get the new hire adjusted to the
company, engaged in
the job, and excited about the work. Too often, day one involves
handing the new hire an assignment and walking away. That is
detrimental because research shows that most new hires decide within a
matter of months whether or not they want to stay with a company long
term (although Endsley commented that in the past she has decided
within a few hours that a new company was not for her).
She offered several strategies to get new hires acclimated and
connected early. One is to rotate the new employee through the whole
company a few days or weeks at a time before settling into a permanent
team. This is particularly helpful for a new hire who is in the
minority at the office; for instance, the sole female engineer on a
team would get to meet other women in other teams that she otherwise
might not get to know at all. Building those connections makes the
new hire more likely to stay engaged. It is also helpful to get the
new hire connected to external networks, such as going to conferences
or engaging in meetups.
Retaining employees is always an area of concern, and Endsley
shared several strategies for making sure recent hires are
happy—because once an at-risk employee is upset, the chances are
much higher that the company has already lost the retention battle.
One idea is to conduct periodic motivation checks; for example, in the
past USENIX has asked her what it would take for her to leave for
another job. Checks like these need to be done more than once, she
noted, since the factors that determine whether an employee stays or
leaves change naturally over time. Companies can also do things to
highlight the diversity of their existing employees, she said; Google
is again a good example of doing this kind of thing right, holding
on-campus activities and events to celebrate different employees'
backgrounds, and cultivating meetup and interest groups.
Another important strategy is to have a clear and fair reward
system in place. No one likes finding out that a coworker makes more
money for doing the same work solely because they negotiated
differently during the interview. And it is important that there be
clear ways to advance in the company. If developers cannot advance
without shifting into management, they may not want to stay. Again,
most of these points are valuable for all employees, but their impact
can be greater on an employee who is in the minority—factors
like "impostor syndrome" (that is, the feeling that everyone else in
the group is more qualified and will judge you negatively) can be a
bigger issue for an employee who is
already the only female member of the work group.
The audience asked quite a few questions at the end of the
session. One was from a man who had concerns that hiring for
diversity can come across as hiring a token member of some demographic
group. Endsley agreed that it can certainly be interpreted that way—if done wrong. But her point
was not to give advice to someone who would think "I need two more
women on my team," but to someone who is interested in hiring from a
diverse pool of applicants. That is, someone who says "I have no
women on my team, and none are applying; what am I doing wrong?" Most
people these days seem to agree on the benefits of having a diverse
team, but most people still have plenty of blind spots that can be
improved upon. But with demand for developers working on open source
code exceeding supply, successfully reaching the widest range of
possible contributors is a wise move anyway.
Comments (15 posted)
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: Smack for Tizen; New vulnerabilities in kernel, mesa, wireshark, xmp, ...
- Kernel: The multiqueue block layer; Reliable user-space OOM handling; Power-aware scheduling.
- Distributions: Debian, Iceweasel, and security; Arch, Debian, Fedora, RHEL, ...
- Development: Mobile health initiatives; PulseAudio 4.0; PyTables 3.0; Source code CSS; ...
- Announcements: RIP Atul Chitnis, events, ...
Next page:
Security>>