|
|
Subscribe / Log in / New account

Medallia and Redhat Talk Docker+Ceph

Medallia and Redhat Talk Docker+Ceph

Posted Jul 2, 2015 4:51 UTC (Thu) by b7j0c (guest, #27559)
In reply to: Medallia and Redhat Talk Docker+Ceph by kreide
Parent article: News and updates from DockerCon 2015

Thanks for the slides. Did you weigh the costs of Ceph vs just dumping objects in S3 or a similar online object storage system? These don't excel at latency and are no replacement for a filesystem but are basically unbeatable for development and operational costs, simplicity etc.


to post comments

Medallia and Redhat Talk Docker+Ceph

Posted Jul 2, 2015 17:05 UTC (Thu) by krakensden (subscriber, #72039) [Link] (18 responses)

Object storage doesn't really solve the problem of "how do we improve the way we deal with our databases", though- and they don't really provide enough features to jettison MySQL/Postgres etc.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 3, 2015 18:00 UTC (Fri) by b7j0c (guest, #27559) [Link] (17 responses)

Those are good points, but even then, I would have sought out something hosted like Aurora or Redshift. Going through these slides I felt like asking "why are you even building this?".

Medallia and Redhat Talk Docker+Ceph

Posted Jul 4, 2015 0:08 UTC (Sat) by Lennie (subscriber, #49641) [Link] (16 responses)

Because if you choose to use Aurora or Redshift you kind of glued your application to AWS instead of making your application more portable.

Aurora and Redshift are very MySQL and PostgreSQL based, but still you are putting someone else in control of how you manage your data or what features are available.

Can you test it on your laptop with regular MySQL and PostgreSQL, does that mean it will run the same on AWS ?

What if you want to move to an other provider ?

Maybe not a cloud provider but a hosting provider (or your own datacenter) because you want to run it on some dedicated servers because that's actually cheaper.

No, long term the right way is probably to deploy some containers which manages itself (like an appliance):

https://flynn.io/docs/postgres#design

I personally think solving this problem at the storage layer is the wrong place.

It's better to deploy some containers on multi servers and re-configure the database-server to replicate between them.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 6, 2015 2:28 UTC (Mon) by b7j0c (guest, #27559) [Link] (15 responses)

Its true that hosted cloud services can be viewed as a "roach motel" so to speak, but no more so than Linux, Postgres or any other technical commitment you make, modulo the monthly fees. Access to source code really isn't an issue, most organizations are not going to be forking their own kernel or RDBMs for a variety of reasons.

In coming years, those who take the time to "grow their own" starting from bare metal will find that their target markets have been staked out already by time their architectures are ready to accept business....by those who simply bit the bullet and let a cloud provider run the commodity functionality so they could get to market faster. Being portable doesn't matter much when you're irrelevant.

Many companies that "roll their own" don't have a NOC, don't have failover, don't have 24x7 ops staff in all regions, don't have redundancy, etc etc. They end up with worse security too, because security is often reactive and without a security-focused NOC, they're vulnerable.

What do you do when your storage maxes out and the vendor can't ship new units to you in time? AWS will never tell you the database is full.

At this point I would only advocate roll-your-own to hobbyists or those in noncompetitive/solved markets that don't have realtime ops or scalability requirements. Docker in my mind is just a speed bump...it only isolates your complexity, it doesn't reduce it. Hosted services will make most of this tooling irrelevant in a few years as people realize that being able to turn on a infinitely-scalable DB in thirty seconds (vs growing a bare-metal architecture over months) far outweighs the issues of lock-in.

It would be interesting to see a real case study on the real costs and benefits of growing a bare-metal architecture vs hosted services. Factoring in time-to-market, ops obligations, opportunity costs of capital (everyone and every dollar committed to building a bare-metal architecture could have been instead focused on real business objectives), etc, I would assume a project like the one described in the slides is more expensive than a hosted to the order of millions.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 6, 2015 17:17 UTC (Mon) by drag (guest, #31333) [Link] (9 responses)

You are kinda contradicting yourself,

You say things like "cloud provider run the commodity functionality"... but proprietary databases are _not_ commodity. Things like MySQL or PostgreSQL are commodity functionality. Not until you can provide the same APIs for 'redshift' yourself using off the shelf open source software can it be considered commodity functionality.

If you don't mind tying all your containers into a proprietary solution then that is fine, but if you are actually talking about commodity stuff then that means APIs and software that _any_ cloud provider can provide.. including yourself.

> What do you do when your storage maxes out and the vendor can't ship new units to you in time? AWS will never tell you the database is full.

You can take that into account with a 'hybrid' approach.

Using a public cloud or private cloud or just a Linux server running containers at home... these things are not mutually exclusive things. None of this stuff is either-or. It only becomes either-or if you choose let your applications depend on proprietary features.

You use software, databases, and APIs that are available in both cloud providers and at home. Take 'S3', for example. You can get that at Amazon, but you can get that by using Swift. That means as long as your application sticks to the subset that Swift can provide then you can use _ANY_ cloud provider you want. You are not stuck using Amazon. If your external network dies then you don't have to send everybody home. It's all still there locally.

That way you can run your own servers locally. Keep them pretty much tapped out at 80-90% capacity and when you need more capacity you just rent it. Believe it or not you can do it cheaper locally if you are smart about it. You use whitebox servers and whitebox switches.. you can use the same stuff the 'big boys' use.. they don't have a exclusive lock on this stuff. You can't do it, of course, if you are running something like 'Cicso Blade Servers' or some such nonsense (of course.). That way you don't have the pay the profit overhead of having Amazon do it for you and you still don't have to pay for unused, redundant, and 'just in case' capacity yourself.

If you use a RDBS then keep a Amazon VPS or Rackspare or whatever active and have it replicate the stuff you have on site. If some other 'clouder provider' has some fire sale on capacity you can take advantage of it instantly without having to bother your programmers or change anything in your business. Just get a account and deploy.

If you tie yourself down to proprietary Amazon services that can't be replicated by other people or yourself using open source software then you are locked in.

Amazon and other people do have massive economy of scale, but so does open source (when it's done correctly). 'Distributed' software shouldn't just mean 'one cloud vendor'. It should really be distribute-able across _anything_ you want.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 6, 2015 21:38 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> You use software, databases, and APIs that are available in both cloud providers and at home. Take 'S3', for example. You can get that at Amazon, but you can get that by using Swift. That means as long as your application sticks to the subset that Swift can provide then you can use _ANY_ cloud provider you want.
In theory. In practice it doesn't work this way.

All the object storages have some ...erm... peculiarities. For example, S3 is very 'eventually consistent' and if you think that you you'll be able to access an object immediately after you write it, then you'll get some nasty surprises. And they are especially nasty because they happen only during high load.

Then there's a question of performance. Individual S3 connections are fairly slow (2-10 megabytes per second, top) but you can have literally thousands of them and they scale almost indefinitely. You can have a cluster of Hadoop servers hammering the S3 storage and it will work just fine. If you try the same pattern locally - you're going to be surprised.

Then there's a question of metadata. While GETs and PUTs on Amazon S3 are insanely fast, metadata operations (LIST) are limited to about 100 per second before you are throttled. Again, that's not something that you'd expect from a single server with an SSD.

So making abstractions at the cloud API is a doomed enterprise.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 7, 2015 3:20 UTC (Tue) by b7j0c (guest, #27559) [Link]

No one does Lists on S3 as a practice, the pain of doing so is spelled out even in the introductory materials. You keep a reference to your objects in a database.

S3 on the other hand has many more '9's in its uptime history than hardware most people manage themselves, and it is insanely cheap, it is one of the few "no brainer" AWS services, which is why so many people use it.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 9, 2015 19:56 UTC (Thu) by drag (guest, #31333) [Link]

> For example, S3 is very 'eventually consistent'

yeah.

All distributed systems have their limits.

As they usually say, you have three choices (something like):

1. Speed

2. scalability/distributive-ness...

3. consistency

You get to pick two.

Things like S3 and Swift sacrifices consistency for the ability to distribute data globally and still have it reasonably fast. So they depend on 'best effort' or 'eventual' consistency. This is great for uploading images of kittens, or containers. javascript libraries for the web app you are going to release next week, but it's lousy if you want to be able to mount it and use it as a POSIX-like file system. On the upside they distribute stuff globally and can optimize things so that people fetching data will go to the closest/most economical servers first.

Which is why you still have a place for things like Ceph, that have a high degree of consistency. So much so that you can run a conventional POSIX file system on top of them. It's not something you can 'naturally' extend across multiple datacenters, but if you want a way to take advantage of a bunch of cheap JBOD arrays to share VMs across a few hundred nodes in one datacenter then it's going to work much better then most.

Ceph offers a S3-style API for itself, but I feel that it would be silly and expensive to try to use it beyond something small. You would get the the worst of both worlds. You wouldn't be able to take advantage of Ceph's consistency guarantees while at the same time you couldn't take advantage of the ability to distribute S3 cheaply.

So as long as you combine like-for-like and the API is good then application developers shouldn't run into much problems.

It's like: even though Ext3 and ZFS are radically different under the hood they still conform successfully to most POSIX conventions. At least enough that for most purposes it doesn't require any re-programming of applications.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 6, 2015 21:40 UTC (Mon) by Lennie (subscriber, #49641) [Link]

drag has a great reply, those are the kind of things I would have mentioned. But I just know what b7j0c is going to mention one part again: security

But I think that containers are a great way to have better security too, because of better componentization.

Why ? Because you are only running one process/application (in case of application containers like Docker and trends like microservices). This really reduces the attack surface over the network. Because these are single purpose applications we (community) have the potential to allow these processes to only do what they are supposed to do.

Remote shell ? There is no shell in your container, maybe even most syscalls aren't even allowed if we create good profiles.

Just like Docker-like application containers can make deploying easier it has the potential to make doing security easier.

An other reason is, if you have a Docker container with PostgreSQL (possibly replicated to multiple machines) how many people will have to make a security profile for running such a service ? Not many if we have a way to share them.

The Docker Security team also talked about this in I believe this video: https://www.youtube.com/watch?v=8mUm0x1uy7c

Those 2 came from Square and as part of the security team there they tried to make deploying security easy:

https://www.youtube.com/watch?v=lrGbK6fE7bI

One of the things they did was mutual TLS auth between all services and frequent key rollovers.

I guess it's their attempt at the OpenBSD motto: secure by default.

Something else I like is some one the persons working on the HP Openstack Cloud (including Public) said:
How do you mean log into a machine ? You can't. It just sends logs with errors to a logserver, there is no remote login.
These are all machines which are deployed and configured with automated tooling (like configuration management). At install time (actually they are using imaging) it pulls the machine configuration information over the network and configures itself.

If you look at for example RancherOS, they are running nothing on the host. It's all containers.

I think there is a lot of potential to get it right. Will we end up at that place... I don't know.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 7, 2015 3:16 UTC (Tue) by b7j0c (guest, #27559) [Link] (4 responses)

> You can take that into account with a 'hybrid' approach.
> Using a public cloud or private cloud or just a Linux server running containers at home... these things are not mutually
> exclusive things. None of this stuff is either-or. It only becomes either-or if you choose let your applications depend on
> proprietary features.

I am fully convinced that hybrid clouds are a loser solution. None of the "big three" (AWS, GCE, Azure) are emphasizing these for good reason: most people don't want them and they are both an architectural and business dead-end. RedHat is promoting hybrids partially because its all they *can* do...they simply don't have the resources to go build dozens of giant data centers.

Hybrid clouds just move your complexity around and mutates it by coupling it to new, immature tools (which Docker is one). What do you get? You've got the "lock in" you dread by using Openshift or whatever, yet you also have to provision a seven figure budget for ops and hardware, and you'll still probably do a crappier job than AWS.

I'd rather just roll-my-own than deal with something like OpenShift etc...I have zero interest in dealing with tens of thousands of lines of alpha quality code, open source or not.

Thats not to say AWS, GCE and Azure aren't without flaws, but the biggest danger so far is that they could gouge you once they have all your data and operations internalized. So far competition is keeping that from happening, and frankly all of these vendors realize that to abuse customers at this stage of the market would be suicide.

Indeed if you look at some of the stuff AWS is doing with Lambda and Kinesis...I don't even think it is possible for a single entity to roll something out like this...the only way you are going to get stuff like that is with a big bet.

> If you use a RDBS then keep a Amazon VPS or Rackspare or whatever active and have it replicate the stuff you have on site.
> If some other 'clouder provider' has some fire sale on capacity you can take advantage of it instantly without having to
> bother your programmers or change anything in your business. Just get a account and deploy.

Why? You can usually just pick up the phone and have your AWS rep start discounting you. Its easier than playing roulette with your business by hunting around for cheap VPS hosters.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 9, 2015 20:04 UTC (Thu) by drag (guest, #31333) [Link] (3 responses)

> Hybrid clouds just move your complexity around and mutates it by coupling it to new, immature tools (which Docker is one). What do you get? You've got the "lock in" you dread by using Openshift or whatever, yet you also have to provision a seven figure budget for ops and hardware, and you'll still probably do a crappier job than AWS.

If you need a seven figure budget for running your own cloud you'll need a seven figure budget for running AWS... if you are doing your job right. Amazon isn't doing anything magical. Their advantage is their experience and they are doing it at a massive scale. They still need to turn a profit and they are not going to be able to get access to any hardware or resources that you can't get yourself through other means.

It's not a slam-dunk either way. Going public cloud doesn't make expenses go to zero.

> Why? You can usually just pick up the phone and have your AWS rep start discounting you.

And when the AWS rep looks at your account and sees that your organization has coded your business around services that you can only get from Amazon... how well do you think negotiations are going to work out for you?

If he knows what he is doing any sort of discounts you get is going to be related directly to the expense of porting your applications to a different platform.

> Its easier than playing roulette with your business by hunting around for cheap VPS hosters.

Making a phone call is certainly easier, but your success with negotiations is going to be directly related to how much or how little risk you have by switching away.

What happens when you call your bluff?

You: "Hey AWS rep, so-and-so is offering the their entitlements at 1/2 the cost of yours. My boss was looking to move away, but I like you guys... what can you offer me?'

AWS: "Tell your boss: Good luck with that"

Medallia and Redhat Talk Docker+Ceph

Posted Jul 9, 2015 23:20 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> If you need a seven figure budget for running your own cloud you'll need a seven figure budget for running AWS...
It's very hard to run reliable infrastructure cheaper than Amazon. If you need a VPS for a private website that you don't care about, then you can get something cheaper. If you need a single dedicated server for some important but not uptime-critical tasks, then you can easily do cheaper than AWS.

However, if you need a redundant infrastructure and/or multiple servers then AWS becomes hard to beat. If you add spot instances to the mix, it becomes downright the cheapest option.

> You: "Hey AWS rep, so-and-so is offering the their entitlements at 1/2 the cost of yours. My boss was looking to move away, but I like you guys... what can you offer me?'
Then it's quite possible that $OTHERCOMPANY is either going to take a loss to get the client. AWS sales reps are actually quite good at offering constructive solutions, like using EC2 Spot or buying reserved instances.

But in the worst case, if you have a seven-figure-per-month cluster then you probably can afford the cost of engineering time to move it to one of the competitors. Microsoft and Google have services that are comparable to Amazon's.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 9, 2015 23:45 UTC (Thu) by dlang (guest, #313) [Link] (1 responses)

> It's very hard to run reliable infrastructure cheaper than Amazon.

not really, you just need to have a lot of bandwidth usage to drive you aws bill through the roof.

Amazon is not using magic. There is some economy of scale involved in their purchasing, but it's really easy to spend a lot more money hosting things in aws than hosting them yourself.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 12, 2015 1:12 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> not really, you just need to have a lot of bandwidth usage to drive you aws bill through the roof.
The only major use-case that you can't easily fix is the price of outgoing data. It's fairly expensive, while incoming data is free. Also, intra-zonal traffic is free in AWS, Google Compute and Microsoft Azure.

> Amazon is not using magic. There is some economy of scale involved in their purchasing, but it's really easy to spend a lot more money hosting things in aws than hosting them yourself.
It's easy to do it, especially if you are going stupid stuff. If you're using AWS smartly - it becomes VERY hard to beat.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 8, 2015 0:51 UTC (Wed) by Lennie (subscriber, #49641) [Link] (4 responses)

Something else I didn't say.

The 3 major cloud providers and some of the larger hosting providers are all from the US. And I'm not from the US. This makes me a foreigner.

You know what that means ? It turns out I have none of the rights you might have (my guess is you are a US citizen):

http://media.ccc.de/browse/congress/2014/31c3_-_6195_-_en...

Maybe it's the land of the free to you, but I don't just don't have the same rights.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 8, 2015 19:26 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

All of the hosting providers have datacenters around the world.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 8, 2015 19:32 UTC (Wed) by Lennie (subscriber, #49641) [Link] (2 responses)

Do you mean all the large US providers have datacenters around the world ?

Why do you think that matters ?

The only thing that matters is that they are US companies not where the data is kept.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 8, 2015 20:21 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

That is debatable. Microsoft is fighting the US government: http://www.zdnet.com/article/microsoft-fights-us-effort-t...

This case is still in the court system.

Medallia and Redhat Talk Docker+Ceph

Posted Jul 8, 2015 22:34 UTC (Wed) by Lennie (subscriber, #49641) [Link]

I really doubt they'll win.

It's not completely the same situation as 'cloud computing' so some other laws are used but.

There are laws in the US that clearly state, these rights only apply to: US citizens in the US.

Basically saying: everyone else is outlawed.

How about a little clip straight out of congress ?:

https://www.youtube.com/watch?v=ijr0E6Lw4Nk#t=15m00s


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds