|
|
Subscribe / Log in / New account

5.3 Merge window, part 1

By Jonathan Corbet
July 12, 2019
As of this writing, exactly 6,666 non-merge changesets have been pulled into the mainline repository for the 5.3 development cycle. The merge window has thus just begun, there is still quite a bit in the way of interesting changes to look at. Read on for a list of what has been merged so far.

Architecture-specific

  • The x86 umonitor, umwait, and tpause instructions are now supported for use by user-space code; they make it possible to efficiently execute small delays without the need for busy loops. A knob has been provided to allow system administrators to control the maximum period for which the CPU can be paused.
  • The pa-risc architecture now supports dynamic ftrace.

Core kernel

  • The scheduler utilization clamping patch set has been merged. This feature allows an administrator to cause specific processes to appear to create more or less load than they actually do; that, in turn, will affect how the CPU frequency governor responds when those processes become runnable. So, for example, an interactive process could be made to appear to have heavy CPU requirements, causing an immediate increase in CPU frequency when that process wakes up.
  • The pidfd_open() system call has been added; it allows a process to obtain a pidfd for another, existing process. It is also now possible to use poll() on a pidfd to get notification when the associated process dies.
  • Also added is the clone3() system call, which reorganizes the clone() interface, makes it more extensible, and adds space for more flags.
  • The new bpf_send_signal() helper allows a BPF program to send a signal to an arbitrary process.
  • The BPF verifier is now able to handle programs with loops, as long as the execution of the loop is bounded and cannot cause the program to exceed the maximum instruction count; that removes a major limitation that has irritated BPF developers for some time. Note that this is not the bounded-loop work that was under discussion late last year; it is a new implementation. It seems that the verifier efficiency improvements merged for 5.2 made this task rather simpler.

Filesystems and block layer

  • The NFSv4 server now creates a directory under /proc/fs/nfsd/clients with information about current NFS clients, including which files they have open.

Hardware support

  • Audio: Conexant CX2072X codecs, Rhythm Tech rt1011 and rt1308 amplifiers, and Cirrus Logic CS47L35, CS47L85, and CS47L90 codecs.
  • Industrial I/O: Infineon DPS310 pressure and temperature sensors, Analog Devices ADF4371 and ADF4372 wideband synthesizers, Analog Devices AD8366 gain amplifiers, and ChromeOS EC lid-angle sensors.
  • Media: Allegro DVT video control units and Amlogic video decoders.
  • Miscellaneous: Freescale i.MX8 DDR performance monitors, Renesas RZ/A1 interrupt controllers, Annapurna Labs fabric interrupt controllers, Atmel SHA204A random-number generators, TI LM3697, LM36274, and LM36274 LED controllers, Dialog Semiconductor SLG51000 regulators, Socionext SynQuacer SPI controllers, Freescale i.MX8M CPU-frequency controllers, Infineon PXE1610 voltage regulators, Infineon IRPS5401 power-management ICs, NXP i.MX8 SCU on-chip OTP controllers, Mixel MIPI DSI PHYs, Fairchild Semiconductor FSA9480 microUSB switches, and ChromeOS embedded controllers.
  • Networking: NXP TJA11xx PHYs, Google Virtual NICs, and Hisilicon HI13X1 network interfaces.
  • USB: Qualcomm PCIe Gen2 PHYs.
  • Removals: the isdn4linux ISDN driver subsystem has been removed entirely; it doesn't appear to have been used for some time. The separate CAPI subsystem is also on its way out, but it has only been moved to the staging directory for now. The mISDN subsystem will remain for now. See this commit for details.

Networking

  • The kernel will now accept IPv4 addresses in the 0.0.0.0/8 range as valid. Getting the Internet as a whole to allow that is a work in progress but, once it happens, it will make 16 million more IPv4 addresses available for use.
  • It is now possible to attach BPF programs (at the control-group level) to the setsockopt() and getsockopt() system calls. That allows the imposition of administrator policy on those calls; see this commit for some documentation.
  • There is also a new socket-level hook to call a BPF program once every round-trip-time interval.

Security-related

  • Cryptographic keys can now be tied to a specific user or network namespace, making them unavailable outside of that namespace. Keys are also now protected by access control lists; see this commit for details. (Note that the ACL patch was subsequently reverted though it may be back before the end of the merge window).

Internal kernel changes

  • force_sig() has always taken the target task as a parameter, but it has never actually been safe to use for anything other than the current task. That parameter has been removed and a large number of callers have been updated.

Linus Torvalds has been a little grumpy during this merge window, having encountered multiple regressions that affected his machine. Most of those have been worked out for now; with luck things will go more smoothly from here on out. If the usual schedule holds, the 5.3 merge window will close on July 21, with the final 5.3 release expected in early-to-mid September.

Index entries for this article
KernelReleases/5.3


to post comments

5.3 Merge window, part 1

Posted Jul 12, 2019 22:34 UTC (Fri) by meyert (subscriber, #32097) [Link] (55 responses)

"The kernel will now accept IPv4 addresses in the 0.0.0.0/8 range as valid. Getting the Internet as a whole to allow that is a work in progress but, once it happens, it will make 16 million more IPv4 addresses available for use."
I would like to know more!
Commit? Background/context?

0.0.0.0/8

Posted Jul 12, 2019 22:44 UTC (Fri) by corbet (editor, #1) [Link] (1 responses)

See this post. Commit 96125bf9985a.

0.0.0.0/8

Posted Jul 16, 2019 23:30 UTC (Tue) by mtaht (subscriber, #11087) [Link]

thx jon for the public steer to the cover letter. next time something like this explodes across the internet I'll include way more info in the actual commit!

5.3 Merge window, part 1

Posted Jul 12, 2019 23:26 UTC (Fri) by bartoc (guest, #124262) [Link] (21 responses)

Hopefully IANA charges an arm and a leg for addresses in the 0/8 range. let's get more IPv6 single stack hosts and services!

5.3 Merge window, part 1

Posted Jul 13, 2019 1:05 UTC (Sat) by shentino (guest, #76459) [Link] (20 responses)

Allowing IPv4 addresses to be owned in any sense of the word was the first mistake. Now that a black market has been established people sitting on hoards they got during the times of plenty are milking them for all they can get and won't give them up without a fight.

5.3 Merge window, part 1

Posted Jul 13, 2019 8:19 UTC (Sat) by cpitrat (subscriber, #116459) [Link] (17 responses)

Which also gives them an incentive to slow down IPv6 deployment if they can ...

5.3 Merge window, part 1

Posted Jul 13, 2019 9:10 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (16 responses)

Not really. Even if class D networks are allowed (which would require patching for unsupported but still ubiquitous systems), all the new IP supply will be used within months.

5.3 Merge window, part 1

Posted Jul 13, 2019 20:46 UTC (Sat) by cesarb (subscriber, #6266) [Link]

Do you mean class E? Class D is multicast.

5.3 Merge window, part 1

Posted Jul 14, 2019 2:41 UTC (Sun) by naptastic (guest, #60139) [Link] (14 responses)

I'm assuming you mean class E because making class D routable would break applications that use multicast. (I use Netjack, so I'd be affected.)

5.3 Merge window, part 1

Posted Jul 16, 2019 4:49 UTC (Tue) by riking (subscriber, #95706) [Link] (1 responses)

Clawing back multicast assignments into unicast-routable addresses is an explicit goal of this effort, yes.

5.3 Merge window, part 1

Posted Jul 16, 2019 8:01 UTC (Tue) by naptastic (guest, #60139) [Link]

Is the intended alternative ff00::/8?

5.3 Merge window, part 1

Posted Jul 16, 2019 23:16 UTC (Tue) by mtaht (subscriber, #11087) [Link] (4 responses)

1) Amazon AWS treats the entire ipv4 address space as a unicast playground. Very, very few userspace stacks actually do much checking of the IP addresses in the first place.

2) From our testing and from source code inspection of all the source code in the world (amazing what we can do nowadays) we've only found *1* application that used a multicast address in the 225/8 - 231/8 address range, which was reserved for future multicast use and never allocated for anything by iana the great multicast-everything orgy of the late 80s.

Do let us know if the multicast portion of this patch here breaks anything:

https://github.com/dtaht/unicast-extensions/blob/master/p...

After fixing 240/4 last december....

We submitted the 0/8 patch in the hope that we would stimulate discussion of the more advanced patches, and it, um, went right upstream (0/8 really is uncontroversial, and john's been waiting for the standard to be changed since he co-invented bootp), and NOW we're getting the discussion on various forums over the last few days and I'm trying to keep up....

5.3 Merge window, part 1

Posted Jul 17, 2019 10:53 UTC (Wed) by naptastic (guest, #60139) [Link] (3 responses)

I can understand how designers of the Internet would be cautious and disallow 0/8, fearing ambiguity. That was a wise choice. Now that we know more, and we're confident 0/8 will work, we're allowing it. This is a wise choice.

Thinking about it more, my anxiety about multicast isn't warranted. All my applications that use multicast are IPv6-capable. If I, as the system administrator, am not willing to set up my network for IPv6 multicast, I have no room to complain.

So... before this thread, I was opposed to reclaiming multicast IPv4 ranges, but having learned more from you wonderful folks, I am now totally indifferent on the matter. Thank you!

(It might be mildly inconvenient if I have to reconfigure my firewall if other classes of networks get reclaimed. Meh: Progress necessitates inconvenience. MEH, I SAY!)

5.3 Merge window, part 1

Posted Jul 17, 2019 16:41 UTC (Wed) by luto (guest, #39314) [Link] (2 responses)

The CME uses multicast. See here, for example:

ftp://ftp.cmegroup.com/SBEFix/Production/Configuration/co...

5.3 Merge window, part 1

Posted Jul 17, 2019 21:34 UTC (Wed) by mtaht (subscriber, #11087) [Link]

The 224/8 and 239/8 ranges were allocated and used for some still common multicast applications. 232/8-238/8 are mapped for a variety of mostly, but not entirely obsolete: https://www.iana.org/assignments/multicast-addresses/mult...

As I said, 225/8-231/8 appear to be entirely unused in the world. 120m addresses.

Let's make 'em unicast!

5.3 Merge window, part 1

Posted Jul 17, 2019 21:35 UTC (Wed) by mtaht (subscriber, #11087) [Link]

(to be clear, that particular multicast application uses addresses in the 233 range)

5.3 Merge window, part 1

Posted Jul 16, 2019 23:19 UTC (Tue) by mtaht (subscriber, #11087) [Link] (6 responses)

re: 240/4 (formerly class E), has basically worked on most OSes, except windows, since 2008. In linux, it needed one trivial patch (landed last december) to work correctly with ifconfig (it already worked with ip route) and 240/4 has now been extensively tested in openwrt.

Only a few routing daemons and related routing hw needs to be modified to make 240/4 globally routable. Babeld already has support, the FRR and bird folk have agreed in principle to make it work, juniper works with a flag, cisco's enterprise routers work, smaller scale ones don't. currently.

5.3 Merge window, part 1

Posted Jul 16, 2019 23:40 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

So you're looking at least for 3-4 more years for it to become available. Routers are not updated often and quiet a few Linuxes stay at the same kernel version for years.

Is it worth it now?

5.3 Merge window, part 1

Posted Jul 17, 2019 3:23 UTC (Wed) by mtaht (subscriber, #11087) [Link] (4 responses)

The best linear projection for 100% ipv6 adoption in one country was 7 years. Others, decades. So... yes, I think cleaning up ipv4 as best we can is needed.

5.3 Merge window, part 1

Posted Jul 17, 2019 16:42 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Right now it's at 30% with about 10% YoY growth: https://www.google.com/intl/en/ipv6/statistics.html

So yes, 7 years is about right. However, this assumes linear growth and it's probably not. For example, China has an official plan to move to IPv6 by 2025 with major deployments starting next year.

5.3 Merge window, part 1

Posted Jul 18, 2019 6:19 UTC (Thu) by cpitrat (subscriber, #116459) [Link]

Well, 2025 is in 6 years, so not that far from 7 years ...

5.3 Merge window, part 1

Posted Jul 17, 2019 20:31 UTC (Wed) by farnz (subscriber, #17727) [Link]

The trouble is that if you look at Geoff's IPv4 Address Report, we were consuming more than a /8 per month until we ran out of free-floating addresses. So cleaning up IPv4 needs to provide 84 /8s to get us to 7 years - for decades, you're looking at hundreds of /8s, which is going to get awkward…

5.3 Merge window, part 1

Posted Jul 18, 2019 23:02 UTC (Thu) by naptastic (guest, #60139) [Link]

I'm looking at a map of the Internet as of 2018, and I see a lot of empty space that I wonder if we could get by either asking nicely, offering money, or passing legislation. (I count nine /8's belonging to US agencies that appear completely dark on the map. How much does the US government really need for all its agencies? Every agency in every state plus federal agencies is still less than 2^16. I'd bet companies like Ford, HP, maybe Apple, would be willing take money for smaller allocations; no code has to change; and whoever buys control of those ranges can recoup the cost by selling smaller allocations.) Maybe I'm being Pollyannaish but I see win-win situations that would require nothing more than cooperation.

5.3 Merge window, part 1

Posted Jul 13, 2019 9:25 UTC (Sat) by dottedmag (subscriber, #18590) [Link] (1 responses)

A black market? Why does one need a black market for property? Black markets only exist when the property rights, especially rights to sell/buy property are seriously constrained.

5.3 Merge window, part 1

Posted Jul 14, 2019 1:24 UTC (Sun) by naptastic (guest, #60139) [Link]

Web hosting veteran here. I worked for a company that added dedicated server hosting to their already-existing shared hosting solution. Their top-of-rack switches were consumer-grade unmanaged switches, and for a while we had customers stealing each others' IP addresses.

I dunno about a "black market" but people sending spam or doing SEO are still willing to beg, borrow, or steal for IPs, given the chance.

5.3 Merge window, part 1

Posted Jul 13, 2019 9:38 UTC (Sat) by jem (subscriber, #24231) [Link] (17 responses)

Sigh. This reminds me of the old fable The Tortoise and the Hare. Millions of new available addresses may sound impressive, but not so impressive when you realize they will be used up in a couple of weeks. Not to mention that "Getting the whole internet to allow that" is something that I don't think will ever happen.

The effort spent on this would have been more wisely spent on promoting IPv6. More usable addresses... no, more usable /64 subnets would have been put to use for less effort. Granted, these addresses would be unreachable to some, but we will in any case inevitably reach a point in the future where IPv4 isn't capable of holding the growing Internet together.

5.3 Merge window, part 1

Posted Jul 13, 2019 10:20 UTC (Sat) by sytoka (guest, #38525) [Link] (15 responses)

If IPv6 had been retro compatible with IPv4, no problem of this type would have happened ...

5.3 Merge window, part 1

Posted Jul 13, 2019 12:02 UTC (Sat) by ju3Ceemi (subscriber, #102464) [Link]

Is there a solution for an "ipv6-like" that stay retro-compatible with ipv4 ?

From my point of view, there is nothing you can do to let ipv4 devices talks to a wider addr space : the IP is nothing but an unique identifier, which de facto makes your wish unrealizable

5.3 Merge window, part 1

Posted Jul 13, 2019 13:51 UTC (Sat) by farnz (subscriber, #17727) [Link] (13 responses)

I hear that a lot from armchair designers, but I've yet to have one enumerate an actual way for IPv6 to be retro-compatible with IPv4 that isn't present in IPv6 already.

IPv4 addresses are a subset of IPv6 addresses - if you have an IPv4 address, then it also exists in (at least) five different forms for different (mostly failed) compatibility mechanisms in IPv6:

  1. As an "IPv4-Compatible IPv6 Address" in ::/96 - e.g. 192.0.2.1 is also ::c000:21. The transition mechanisms that wanted to use this form of addressing (e.g. the original form of SIIT) never reached deployable state, and thus this form is now deprecated.
  2. As an "IPv4-Mapped IPv6 Address" in ::ffff:0:0/96 - e.g. 192.0.2.1 is also ::ffff:c000:21. This is still in use, so that applications can pretend that IPv4 does not exist, and use these to refer to IPv4 addresses (when talking to the network stack, and in some protocols).
  3. In 64:ff9b::/96 for NAT64 and 464XLAT purposes - e.g. 192.0.2.1 is also 64:ff9b::c000:21. This allows you to communicate over an IPv6-only network with a central IPv4 host that does NAT for you, getting you the CGNAT experience when talking to IPv4 hosts; this is deployed in (at least) T-Mobile USA and EE UK mobile networks.
  4. Lots of times in 64:ff9b:1::/48 (depending on provider) to allow you to have multiple NAT64 or 464XLAT deployments addressed differently.
  5. As a 6to4 prefix in 2002::/16, for example 192.0.2.1 gets you control of all of 2002:c000:21::/48. This lets you route IPv6 as an overlay on your existing IPv4 network, but is not popular because properly deployed IPv4 will always be at least as fast as or faster than IPv6 carried over an IPv4 network.

On top of that, SIIT got reworked into SIIT-DC, which allows a pure IPv6 network to choose a prefix for IPv4 use and do stateless NAT instead of NAT64. And, of course, you can do NAT64 or 464XLAT in a private prefix.

Each of these mechanisms has its own set of problems as compared to running dual stack networks; 464XLAT and NAT64 are only happening now because in the mobile world, operators are beginning to experience pain from running NAT'd IPv4, and reducing the need for NAT via IPv6 saves actual dollars. And, if nothing else, pure IPv4 has the advantage of zero change needed - any alternative IPv6 proposal needs to cope with IPv4-only hosts that refuse any form of change to support IPv6, otherwise you face the same problems as SIIT, NAT64, and 6to4 do.

5.3 Merge window, part 1

Posted Jul 13, 2019 16:11 UTC (Sat) by plugwash (subscriber, #29694) [Link] (12 responses)

6to4 has two big problems.

1. Routers don't relay by default, the relaying is handled by specific relay routers, which most ISPs don't implement.
2. It doesn't work with NAT.

A mechanism similar to 6to4 but designed to work with NAT (not to fight against it like teredo does) and that was implemented by default as part of every dual stack router and OS would IMO have eased deployment considerably.

5.3 Merge window, part 1

Posted Jul 13, 2019 16:25 UTC (Sat) by farnz (subscriber, #17727) [Link] (11 responses)

The lack of relay routers will exist for any overlay network - and note that if you have IPv4 on both ends, you can route to 2002::/16 over IPv4 directly, not needing a relay router. There's no particular reason why all dual-stack routers weren't relay routers, except that operators did not want to support any IPv6 transition mechanism at all - not even 6to4.

And Teredo is as good as you can get given the way NAT works; fighting NAT is the norm when you're trying to tunnel through it.

As I said, it's easy to set the goals for a better mechanism, it's a lot harder to actually design it.

5.3 Merge window, part 1

Posted Jul 14, 2019 2:55 UTC (Sun) by plugwash (subscriber, #29694) [Link] (10 responses)

> The lack of relay routers will exist for any overlay network

Yes it will for any overlay network invented *now*.

If such encapsulation had been part of the core protocol from the start and relaying behaviour had been mandated or at least strongly recommended as part of every dual stack router then we wouldn't have the mess of relay shortages we have today.

> fighting NAT is the norm when you're trying to tunnel through it.

The alternative to fighting the NAT is to go with the flow of it, that is when deencapsulating. modify the V6 addresses to match the v4 addresses/ports.

Of course such an idea is unthinkable to the "end to end is sacred" crowd.

5.3 Merge window, part 1

Posted Jul 14, 2019 3:18 UTC (Sun) by gus3 (guest, #61103) [Link]

The "end to end is sacred" crowd put far too much trust in the network at-large. Replace one switch with one hub, and the whole "sacred" thing gets scrambled into "scared" instead.

That crowd should shift their thinking, to "just make sure the damn thing keeps working the way it did."

5.3 Merge window, part 1

Posted Jul 14, 2019 10:17 UTC (Sun) by farnz (subscriber, #17727) [Link] (8 responses)

I'm not talking about overlay networks invented now - I'm talking about overlay networks invented back in 1999, when IPv6 was new. And 6to4 encapsulation was strongly recommended as part of every dual stack router back then; the only error, IMO, in 6to4, was the effort expended to try and get people to run public relays, when we would have been better off with 2002::/16 containing the entire IPv4 routing table over time, and with everyone routing their own subset of 2002::/16, on the basis that when native IPv6 arrives, we'd be able to turn it off.

Further, note that there's no-one able to mandate relay operation - operators want to control what they offer, and won't offer free services just for the sake of it, especially not expensive ones like relays.

Going with the flow of NAT is what NAT64 (called NAPT in early IPv6 documents) does. It's not been deployed until recently because provides zero gain unless you're planning to turn off IPv4 for large swathes of your network.

It's perhaps worth remembering that the whole reason Teredo exists is that Microsoft wanted an easy way for game developers to write NAT traversal multiplayer games; Teredo was their answer, in that game developers just write IPv6-only games ignoring the existence of NAT, and Microsoft handles the NAT traversal problem for you in Teredo.

Again, though, I don't see constructive answers on how IPv6 could have been technically better at retrocompatibility - just claims that it can't be fixed (despite the fact that the very fix you're suggesting was in IPv6 in 1999, and ignored by network operators), and an insult to the people working on this stuff. What protocol changes would you have made that make IPv6 more compatible with IPv4?

5.3 Merge window, part 1

Posted Jul 14, 2019 10:37 UTC (Sun) by ianmcc (subscriber, #88379) [Link] (7 responses)

Make the IP address variable length; IPv4 would just be the special case where the address is 4 bytes.

5.3 Merge window, part 1

Posted Jul 14, 2019 11:02 UTC (Sun) by farnz (subscriber, #17727) [Link] (6 responses)

That has two problems, one inherent to variable length addressing, and one an upgrade problem, and also ignores the fact that IPv4 is already a special case of IPv6.

  1. Until the last IPv4-only host has upgraded to support variable length addressing, all hosts need to have a 4 byte address. Like with IPv6 deployment, this is a chicken-and-egg problem; why would I choose to have a 5 byte or longer address when I could stick to 4 byte addresses and not have any interop issues?
  2. Hardware-assisted routing has to handle addresses in fixed size chunks, and the routing delay is proportional to the number of chunks the hardware handles (i.e. the smaller the chunk, the bigger the penalty for a long address). With a variable length address, as CLNP proved with its NSAPs, there is an incentive to keep to short addresses, because the unfortunate souls with longer addresses have slower networking. In turn, this doesn't resolve the address shortage for the long term - there's always incentive to have "IPv4" addresses.

In IPv6, all addresses have same hardware latency, and IPv4 is a special case of IPv6 anyway (this fact is used by SIIT-DC with a per-DC IPv6 /96 matching all of IPv4, and was the idea behind SIIT).

5.3 Merge window, part 1

Posted Jul 15, 2019 11:45 UTC (Mon) by ianmcc (subscriber, #88379) [Link] (5 responses)

Problem 1 is common to IPv6 as well. Legacy devices can go behind a NAT router, which basically everything is already.

Done properly, I don't think variable length addresses need to have a performance penalty, and indeed it might end up faster. Eg, if my ISP is allocated the address A.B.C, then they allocate their customers addresses of the form A.B.C.D. The routing tables only need to refer to A.B.C, and the IPS's routers only need to look up on D. My home network can use addresses of the form A.B.C.D.E (or additional devices hanging off something can get A.B.C.D.E.F et cetera). This doesn't change the lookup time for the upstream routers because they just ignore the parts of the address that are not relevant for them.

5.3 Merge window, part 1

Posted Jul 15, 2019 13:30 UTC (Mon) by farnz (subscriber, #17727) [Link]

While problem 1 is common to IPv6 during the transition period, the problem with variable length addresses with 1:1 IPv4 compatibility for the source + destination are 4 bytes case is that the transition period is effectively infinite - there is never a penalty for refusing to migrate, whereas in IPv6 land, there is a penalty for failure to migrate once a tipping point is reached. For example, today it is the case that if you care about the performance of your servers when accessed via a mobile phone, you need to support IPv6, because for significant subsets of mobile users, IPv4 goes via a remote NAT, while IPv6 takes the shortest route.

And variable length addresses always have a performance or cost penalty in hardware, which never goes away. For a fixed size address, the router simply reads the address and acts on it. For a variable length address, the router has to read the length, read the first chunk of the address, mask off any parts of the first chunk of address that aren't valid, attempt to act on it, and then if the needed part of the address is longer than the chunk, repeat for the next chunk. Worse, if you're not cautious, router manufacturers will attempt to "get away" with not handling the full complexity - e.g. only route on the first N bits, and ignore the rest of the address - and if those routers become common, you've effectively shrunk the routable component of the address. We've seen this in IPv4 in the 80s, where routers fell back to a slow path if the routing prefix was too long (more than 16 bits), and we've seen this in IPv6 routers that only route on the first 64 bits of the address. Variable length addressing just makes this harder, because you also have to handle the pain that 32 bit "1.1.1.1" is not guaranteed to route to the same place as 64 bit "1.1.1.1/32", which is not guaranteed to route to the same place as 128 bit "1.1.1.1/32" (well, unless you remove the requirement that 32 bit "1.1.1.1" routes to the same place as IPv4 "1.1.1.1").

This extra complexity is inherent to variable length addressing, and makes the hardware more complex; in turn, this means that you either need more complex hardware to handle lookups in the same number of clock cycles, or you need more clock cycles to do the same lookup. Fixed length addresses avoid this - you always read a fixed size chunk and then act on it.

5.3 Merge window, part 1

Posted Jul 15, 2019 13:46 UTC (Mon) by excors (subscriber, #95769) [Link]

> Eg, if my ISP is allocated the address A.B.C, then they allocate their customers addresses of the form A.B.C.D.

I think the problem is that in practice, strict hierarchical addressing doesn't work. E.g. there's anycast, where the same IP address is advertised by multiple servers around the world, and users will get routed to whichever one is nearest (based on BGP's definition of "nearest"). Or for redundancy you might want one server to advertise a single IP prefix through two ISPs, so if one fails it'll get routed through the other.

Non-hierarchical usage of the IPv4 space has been a known issue for many years, causing significant expansion of routing tables (see e.g. https://bgp.potaroo.net/). That's quite a problem when routers store the table in expensive content-addressable memory (for efficient lookups), and the table size grows too large for the hardware.

There's a more fundamental issue with IP addresses being both "locator" and "identifier". Originally they were seen as locators, i.e. a hierarchical address that describes how to find the server with increasing specificity, with routing based on IP prefixes and CIDR etc. DNS mapped stable identifiers (domain names) onto addresses. DNS didn't work well enough for that, so nowadays IP addresses are often just identifiers and don't indicate anything about the actual location of the server (as with anycast and multihoming), but routing protocols weren't designed to be efficient identifier lookup services. Occasionally people have tried to disentangle the two concepts, like with LISP, but I don't know if they've had any success.

5.3 Merge window, part 1

Posted Jul 15, 2019 14:47 UTC (Mon) by imMute (guest, #96323) [Link] (2 responses)

That's how route aggregation works today. Route lookups are already fast using hardware TCAM. Variable length addresses would make the TCAM implementation harder. Or, more likely, they'd just make the TCAM addresses the max size allowed by the variable length spec. And you'd end up with smaller tables that wasted space.

5.3 Merge window, part 1

Posted Jul 15, 2019 15:10 UTC (Mon) by farnz (subscriber, #17727) [Link]

Note, too, that a variable length address space limited to N bits of address can be mapped into a fixed size address space of size N+1 bits. You add a prefix bit which is 1 if the next N bits are the full address, or 0 otherwise, and do this recursively. You can then unmap by counting leading 0s to retrieve the address size, strip the next 1 bit, and the remainder is the address.

In other words, unless your variable length address is greater than 127 bits in maximum size, it can be entirely mapped into IPv6.

5.3 Merge window, part 1

Posted Jul 16, 2019 23:56 UTC (Tue) by mtaht (subscriber, #11087) [Link]

I've kind of wondered how much of the internet, particularly the IPv6 portion, is actually routed by TCAM based hardware. Software routing in SDR and Linux/BSD based implementations seems to be on the rise.

5.3 Merge window, part 1

Posted Jul 16, 2019 23:29 UTC (Tue) by mtaht (subscriber, #11087) [Link]

The amount of effort required for more ipv4 as we are doing it is trivial. It's a string of essentially one line patches to a dozen OSes, and a recompile of userspace, if you want to use the new formerly reserved for future multicast addresses. 89 packages, total, in fedora, needed to be recompiled (none, for 0/8, or 240/4)

And testing of course! We reached the point in our testbeds where nothing less than a global scale (e.g. a commit to linux mainline) test would prove anything.

Eventual politics and standardization efforts will eat 1000s more of (hopefully someone else's) time than either of these two phases.

IPv6 is way harder in many, many respects. And still, desperately needed.

Making 0/8 work, universally, is going to be *easy*.

Maybe if 0/8 creates enough controversy, someone will listen to me about what more is needed to accelerate ipv6 adoption.

5.3 Merge window, part 1

Posted Jul 16, 2019 23:09 UTC (Tue) by mtaht (subscriber, #11087) [Link] (12 responses)

Wow. I had no idea this would spawn so much comment.

1) The context of the effort and talk: https://www.netdevconf.org/0x13/session.html?talk-ipv4-un...

And I like to think it's one of my better talks, despite the clothing malfunction.

Yes! we need faster ipv6 adoption! But if we want to interconnect with the rest of the internet ipv4 is still going to be required for a very, very, long time. Think hard about this:

I'd asked everyone in the room on the "just deploy ipv6" side - which I used to be on, too! - to think about "“Even if they have deployed IPv6, growing networks *must continue to acquire* scarce, increasingly expensive IPv4 addresses to interconnect with the rest of the Internet.”" -
https://www.internetgovernance.org/2019/02/20/report-on-i...

And think harder on the ipv6 deployment problems. I have a long note about what's going wrong with that, too, which I need to finish and write up somewhere. I like to think we made one giant leap with the cerowrt project, but more major leaps are required to finish ipv6 adoption in any foreseable amount of time.

Anyway, the conclusion john gilmore, myself, and paul came to, that any way we thought about the future of the internet, more ipv4 addresses were going to be needed. Making 'em is a start. No matter what we do to make more... IPv4 prices look to skyrocket in the coming years.

Anyway, to answer another thought, 0/8 (16m) is more addresses than amazon, google, and facebook have, combined. 240/4, which we removed the last barrier to adoption for in linux last december, has 260m addresses. 225/8-231/8 have ~120m. Previous allocation policies DO need to be rethought in order to make best use of these, but first up is making it technically feasible at all.

Another thought: On any given day only about 700m ipv4 addresses appear on the internet.

Lastly... it might take 5-7 to make these address ranges fully usable... but even then, based on the ipv6 deployment curves in the above report, still seems worth it.

Always helps to have more folk helping, of course, please see https://github.com/dtaht/unicast-extensions for more info and patches to various daemons and other OSes.

5.3 Merge window, part 1

Posted Jul 18, 2019 9:01 UTC (Thu) by jem (subscriber, #24231) [Link] (11 responses)

>I'd asked everyone in the room on the "just deploy ipv6" side - which I used to be on, too! - to think about "“Even if they have deployed IPv6, growing networks *must continue to acquire* scarce, increasingly expensive IPv4 addresses to interconnect with the rest of the Internet.”"

I believe the Internet is going to break up into two islands at some point anyway. There is only so much juice you can squeeze out of an orange. We will reach a tipping point where the cost of getting IPv4 addresses and the popularity of IPv6 make it more economical for some services go IPv6 only. The problem with today's thinking that IPv4 is the reference, and IPv6 some kind of add-on, combined with the exponential growth of the Internet, is that it will hit a wall at some point. I will not feel sorry for the companies going bankrupt because of this.

>Anyway, to answer another thought, 0/8 (16m) is more addresses than amazon, google, and facebook have, combined.

It is my understanding that Facebook moved to IPv6 on their internal network, because 10.0.0.0/8 was not enough for the size and topology of the network.

>And think harder on the ipv6 deployment problems. I have a long note about what's going wrong with that, too, which I need to finish and write up somewhere.

Please do, I would like to read about these problems. It's funny how some organizations seemingly have had no trouble deploying IPv6 half a decade ago, while others don't seem to be struggling hard. Which makes me believe they aren't struggling at all, they just have an attitude problem. Or how do you explain the great differences between ISPs within a single country? Or that there does not seem to be a correlation between deployment rate and the status of a country as a developing vs. industrialized nation? What explains that Google's statistics show Germany having a IPv6 deployment rate of 42% and neighboring Denmark only 3.54%? Or Malaysia at 38% vs. Indonesia 0.32%?

5.3 Merge window, part 1

Posted Jul 18, 2019 9:56 UTC (Thu) by farnz (subscriber, #17727) [Link]

The other point is that between DSR and IPv4 in IPv6 tunnels for services, and the gradual move to IPv6, it's possible for someone like Facebook to stop needing more IPv4 as long as they have full IPv6 support.

This then decouples your growth from your IPv4 address needs to some degree; you need sufficient IPv4 that each load balancer can receive a full set of requests. With current hardware, that lets you receive tens of gigabits per second of upload to your service and requests for downloads per IPv4, while responses to the Internet are dealt with by DSR.

So, as long as people are actively migrating to IPv6 with IPv4 as an add-on, the demand for IPv4 will fall. You just don't need that much IPv4 per service user when you treat IPv4 as a secondary protocol.

5.3 Merge window, part 1

Posted Jul 18, 2019 17:58 UTC (Thu) by mtaht (subscriber, #11087) [Link] (9 responses)

You just made some good points and I'm out of time today to reply - tomorrow perhaps. In the meantime, wanted more folk to plow through the report, here:
https://via.hypothes.is/https://www.internetgovernance.or...

and think about it.

5.3 Merge window, part 1

Posted Jul 18, 2019 18:27 UTC (Thu) by mtaht (subscriber, #11087) [Link]

and while I'm tossing random links in for thought, this piece has been on my mind a lot lately: https://apenwarr.ca/log/20170810

5.3 Merge window, part 1

Posted Jul 19, 2019 17:46 UTC (Fri) by jem (subscriber, #24231) [Link] (7 responses)

I read the report. Here are a couple of random reflections:

> The prospect of what some engineers have called “IPv4 runout” was the main reason for developing IPv6 in the first place. From an economic point of view, however, resources never just “run out;” instead, as their supply diminishes they become increasingly expensive, and consumption patterns adapt to scarcity with greater conservation and new forms of substitution.

Here we have the tortoise and the hare again. Take the number of potentially available IPv4 addresses. Let's say half of them change owners to make better use of them. We still have half of the available addresses left. Now half of them can be sold, and we still have half left!

> The incentives provided by the secondary market have led to the identification of millions of unused or underutilized IPv4 numbers by brokers such as IPv4 Market Group and exchanges such as Addrex and Hilco Streambank.

Remember, the demand for (old) new IPv4 addresses is one /8 per month (16 million).

The report compares IPv6 and IPv4 purely from a cost perspective, but fails completely (or I didn't find it) to consider the scenario that supporting IPv6 is a must for an ISP, or they will lose their customers. Let's say some reasonably popular services start popping up as IPv6-only, because that makes more sense economically.

5.3 Merge window, part 1

Posted Jul 19, 2019 18:11 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> The report compares IPv6 and IPv4 purely from a cost perspective, but fails completely (or I didn't find it) to consider the scenario that supporting IPv6 is a must for an ISP, or they will lose their customers.
This will not happen until IPv6 is firmly in place. More realisticly, CGNs are a growing pain for ISPs and users and 464XLAT is a way to make them somewhat less painful.

Once the remaining IPv4-only hardware is phased out (within the next 3-5 years), the only barrier preventing IPv6 is simply organizational inertia.

5.3 Merge window, part 1

Posted Jul 21, 2019 2:49 UTC (Sun) by mtaht (subscriber, #11087) [Link] (1 responses)

Ya know, I've been interacting with you two (at least) for over a decade, and I have not the foggiest idea who you are!

I've been tired of typing of late, I was thinking pulling together a videoconference to discuss these issues would be fun. Would that work for you?

We used to do a thing on the vuc show pretty regular.

5.3 Merge window, part 1

Posted Jul 22, 2019 2:00 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Uhm, me? Sure, feel free to drop me an email at alex.besogonov@gmail.com

5.3 Merge window, part 1

Posted Jul 23, 2019 13:04 UTC (Tue) by nilsmeyer (guest, #122604) [Link] (3 responses)

> The report compares IPv6 and IPv4 purely from a cost perspective, but fails completely (or I didn't find it) to consider the scenario that supporting IPv6 is a must for an ISP, or they will lose their customers. Let's say some reasonably popular services start popping up as IPv6-only, because that makes more sense economically.

That would require the cost of an IPv4 address to rise astronomically, and currently that's not how the economics of IPv4 work. Most customers I work with don't bother with IPv6 at all, when I'm in the position to set up infrastructure I usually just sneak it in since it's mostly a freebie. Then again, my personal servers currently don't have an IPv6 address assigned even though I have DS-Lite at home...

5.3 Merge window, part 1

Posted Jul 23, 2019 15:49 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

There's a degree of geographic luck involved, too - most of the developed world has enough IPv4 that there's no short-term shortage (no need to put everyone behind CGNAT, for example). IPv6 is thus something you do because you want it, not because you need it - it's worth it for the big players (Google/YouTube, Netflix, Facebook) because it lets you bypass CGNAT on mobile and in countries with IPv4 shortage, which improves your performance metrics.

In contrast, in a reasonable number of less developed countries, you're stuck behind CGNAT for IPv4 whether you like it or not, and need IPv6 if you want to run a server other than an onion service, or you pay for AWS/GCE/other services in a country with enough IPv4. If you're lucky, your ISP is sane enough to run 464XLAT or dual-stack; if you're unlucky, you're IPv4-only and have no choice but pay for Western services.

5.3 Merge window, part 1

Posted Jul 24, 2019 7:33 UTC (Wed) by jem (subscriber, #24231) [Link] (1 responses)

There's a degree of geographic luck involved, too - most of the developed world has enough IPv4 that there's no short-term shortage (no need to put everyone behind CGNAT, for example).

According to Wikipedia, "RIPE NCC, the regional Internet registry for Europe, was the second RIR to deplete its address pool on 14 September 2012", after APNIC. AFRINIC was the last. The situation in North America may be a bit better.

[IPv6 is] worth it for the big players (Google/YouTube, Netflix, Facebook) because it lets you bypass CGNAT on mobile and in countries with IPv4 shortage, which improves your performance metrics.
Google (including YouTube) and Facebook are doing operators like T-Mobile a big favor, because they generate a large portion Internet traffic, which means packets between the handset and YouTube doesn't have to go through NAT64. It's all end-to-end with no address or protocol conversion, like the Internet was originally designed to work, only using IPv6 this time.

5.3 Merge window, part 1

Posted Jul 24, 2019 8:20 UTC (Wed) by farnz (subscriber, #17727) [Link]

It's not the RIR holdings of IPv4 that matter - the run out of a RIR simply means that new ISPs in a region cannot get started. Instead, it's the IPv4 holdings of each ISP in the region that matter, as compared to the size of their customer base; if (to choose an example that has deployed IPv6) Comcast in the USA has 50 million potential customers in its service area, and a total of 60 million IPv4 addresses across all its ASes, it can shuffle IPv4 around to get one IPv4 per customer, minimum, and thus no CGNAT.

In contrast, if an ISP in Botswana has 50,000 IPv4 addresses (half the total assigned to Botswana), it can't offer one IP address per user in its service area (it has to CGNAT 7:1 if it claims the whole of the current Internet market in Botswana, and worse if the Internet grows). Because the RIRs are out, said ISP can only grow by buying addresses from someone else when it needs them - so the incentive is there to CGNAT.

Plus, you then have user expectations to take into account; in the USA, people expect the experience they get from stateless routing by ISPs, and NAT under their control; this means that CGNAT is expensive, because it has to maintain state in such a way that a single device failure does not lose a user's NAT mappings. In other countries, CGNAT can be cheaper because it's all the local users have ever known, so it can be unreliable, and that's just the way the Internet is.

And IPv6 is a mutual thing for the big players and mobile operators - T-Mobile loses demand on its CGNAT (which they like), and the big operators get improvements on their latency metrics (which they like). Note that, for example, just-in-time sending of the next chunk of video involves smaller chunks if the latency from server to client is lower, and you can spend more time preparing a page for the same TTI if the latency is lower.

HP-PA / PA-RISC

Posted Jul 26, 2019 11:29 UTC (Fri) by meuh (guest, #22042) [Link] (1 responses)

The pa-risc architecture now supports dynamic ftrace

HP-PA / PA-RISC seems more maintained as an architecture, than, say, Itanium / IA-64. In particular, QEMU has support for HP-PA / PA-RISC emulation, but not for Itanium / IA-64.

As I've never seen, nor used a physical HP-PA machine, I can only guess they're mostly some kind of mainframes, where upgrading to newer Linux kernel won't happen anytime soon.

So I wonder what the incentive is to keep updating HP-PA / PA-RISC alive, beside usual hacker motives.

HP-PA / PA-RISC

Posted Jul 26, 2019 11:44 UTC (Fri) by pizza (subscriber, #46) [Link]

Itanium hardware can still be purchased new (until 2021), but PA-RISC hardware was discontinued in 2008, and emulating obsolete hardware is a lot cheaper than rewriting a boatload of [usually highly] proprietary software.

(It could be argued that the stuff that was easily ported from PA-RISC has already been done so (be it to x86/x86_64 or Itanium) so all that's left is the stuff that cannot be ported yet is so critical that it's worth emulating forever..)

Give it a few more years, and you'll probably see interest in Itanium emulation pick up.


Copyright © 2019, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds