Native Python support for units?
Back in April, there was an interesting discussion on the python-ideas mailing list that started as a query about adding support for custom literals, a la C++, but branched off from there. Custom literals are frequently used for handling units and unit conversion in C++, so the Python discussion fairly quickly focused on that use case. While ideas about a possible feature were batted about, it does not seem like anything that is being pursued in earnest, at least at this point. But some of the facets of the problem are, perhaps surprisingly, more complex than might be guessed.
Custom literals
On April 1, Will Bradley posted
a, presumably non-joking query about custom literal support for Python;
"has this been considered and rejected, or is there a reason it's
unpopular?
" According to
Stephen J. Turnbull, user-defined syntax for Python has
a been a hard sell, in general, though literal syntax for units
(e.g. "10m" for meters or "1.2kW" for kilowatts) has gotten
somewhat further. In addition, the idea of adding a literal syntax for
fixed-point constants that use the decimal package
also crops up with some frequency, he said. His recollection is that "in
general Python has
rejected user-defined syntax on the grounds that it makes the language
harder to parse both for the compiler and for human beings
".
Brian McCall warned that he was getting up on his soap box, but said
that he strongly supported Python (and, indeed, all programming languages)
having native units. "It's not often that I would say that C++ is easier to read or more WYSIWYG
than Python, but in this case, C++ is clearly well ahead of Python.
"
He suggested that "lack of native language support for SI units
" is
particularly problematic because of the prevalence of scientific computing
today. SI
units are the international system of units, which are often referred to
as the metric system.
Anyone who has ever dealt with units will immediately recognize a problem associated with enshrining only the SI units into Python (or anywhere else): other measurement systems exist, including imperial and US units and even other metric systems. As Ricky Teachey put it:
BUT- SI units isn't enough. Engineers in the US and Canada (I have many colleagues in Canada and when I ask they always say: we pretend to use SI but we don't) have all kinds of units.Give us native, customizable units, or give us death! Who's with me??!!
Units
Beyond the conflict over measuring-system support, there are other fundamental questions that need to be resolved before new syntax could be added, Chris Angelico said. The number "4K" might mean four kelvins, 4000, or 4096, depending on context, for example. Angelico suggested that an existing bit of syntax could be repurposed for units:
But I would very much like to see a measure of language support for "number with alphabetic tag", without giving it any semantic meaning whatsoever. Python currently has precisely one such tag, and one conflicting piece of syntax: "10j" means "complex(imag=10)", and "10e1" means "100.0". (They can of course be combined, 10e1j does indeed mean 100*sqrt(-1).)[...] In Python, I think it'd make sense to syntactically accept *any* suffix, and then have a run-time translation table that can have anything registered; if you use a suffix that isn't registered, it's a run-time error. Something like this:
import sys # sys.register_numeric_suffix("j", lambda n: complex(imag=n)) sys.register_numeric_suffix("m", lambda n: unit(n, "meter")) sys.register_numeric_suffix("mol", lambda n: unit(n, "mole"))[...] Using it would look something like this:
def spread(): """Calculate the thickness of avocado when spread on a single slice of bread""" qty = 1.5mol area = 200mm * 200mm return qty / area
Greg Ewing objected
to the idea of a global registry since two libraries might want to use the
same suffix for different units. But there are other reasons modules might
need their own definitions, as Steven D'Aprano pointed
out. There are multiple units called a "mile" (e.g. Roman,
international, nautical, US survey, imperial, ...), for one thing, but even
the definitions of the "same" unit may have changed over time: "a
kilometre in 1920 is not the same as a kilometre in 2020, and
applications that care about high precision may care about the
difference
". He thinks that units should be scoped like
variables or, at least, have a separate per-module namespace where libraries
can register their own units if they wish to.
Angelico did not agree, at least in part because he did not see the need to make things more complex for what he sees as a rare use case. He seemed to be only participant in the thread that thought that way, however, as D'Aprano, Ewing, and others were all fairly adamant that some kind of unit namespace would be needed. Though the reply is perhaps a bit on the rude side, D'Aprano put it this way:
Units are *values* that are used in calculations, not application wide settings. The idea that libraries shouldn't use their own units is as silly as the idea that libraries shouldn't use their own variables.Units are not classes, but they are sort of like them. You wouldn't insist on a single, interpreter wide database of classes, or claim that "libraries shouldn't create their own classes".
Other possibilities
Ewing suggested a solution that could perhaps be done with no (or minimal) syntax changes:
Treating units as ordinary names looked up as usual would be the simplest thing to do.If you really want units to be in a separate namespace, I think it would have to be per-module, with some variant of the import statement for getting things into it.
from units.si import units * from units.imperial import units inch, ft, mile from units.nautical import units mile as nm
Teachey had mentioned the Pint library in his reply. It is perhaps the most popular Python Package Index (PyPI) library for working with units. Pint comes with a whole raft of units, which can be easily combined in various ways. For example:
>>> import pint >>> ureg = pint.UnitRegistry() >>> speed = 17 * ureg.furlongs / ureg.fortnight >>> speed <Quantity(17.0, 'furlong / fortnight')> >>> speed.to('millimeter/second') <Quantity(2.82726756, 'millimeter / second')> >>> d = 1 * ureg.furlong >>> d.to('feet') <Quantity(660.00132, 'foot')> >>> d.to('mile') <Quantity(0.12500025, 'mile')>
At least on my system, Pint seems to have a slightly inaccurate value for
a furlong, which is defined as 1/8 mile, or 660 feet; a fortnight
is, of course, two weeks
or 14 days. That oddity aside, Pint has much of the functionality users
might want, but it (and other Python unit-handling libraries) have "so
many shortfalls
", Teachey said, mostly because they are not specified
and used like real-world units are.
Beyond Python libraries, the venerable Unix units utility has
similar capabilities and can be used directly from the command line (its
man page is where the classic "furlongs/fortnight" example comes from). As
D'Aprano noted, units has over 3000 different units, which
can be combined in a truly enormous number of ways.
Ethan Furman started a new thread from Teachey's message in order to focus specifically on native support for units. He floated his own suggestion for new syntax:
Well, if we're spit-balling ideas, what about:63_lbsor77_km/hr? Variables cannot start with a number, so there'd be no ambiguity there; we started allowing underbars for separating digits a few versions ago, so there is some precedent.
Teachey wondered
about the behavior of the "simple tags
" being suggested for units:
[...] What should the behavior of this be?height = 5ft + 4.5inSurely we ought to be able to add these values. But what should the resulting tag be?
He also wondered if a more natural-language-like formulation
(e.g. 5ft 4.5in) should be supported. Overall, he thinks
that figuring out a solution for units in Python would be a "massive
contribution
" to the engineering world, "but boy howdy is it a tough
[nut] of a problem to crack
". Angelico said
that the "5ft 4.5in" syntax was a step too far in his mind, but
using addition should work. "It's not that hard to say
'5ft + 4.5in', just like you'd say '3 + 4j' for a complex number.
"
Angelico went on to describe the benefits of the syntax change over simply defining constants for units, as Ewing suggested. Since, for example, "m" would only be valid as a unit when it was used as a suffix, it would not pollute the namespace for using "m" as a variable. It is also more readable:
If this were accepted, I would fully expect that libraries like pint would adopt it, so this example:>>> 3 * ureg.meter + 4 * ureg.cm <Quantity(3.04, 'meter')>could look like this:>>> 3m + 4cm <Quantity(3.04, 'meter')>with everything behaving the exact same after that point. Which would YOU prefer to write in your source code, assuming they have the same run-time behaviour?
But Ken Kundert said that units are primarily useful on input and output, not in the calculations within programs and libraries.
The idea that one carries units on variables interior to a program, and that those units are checked for all interior calculations, is naive. Doing such thing adds unnecessary and often undesired complexity.
His QuantiPhy library
provides a means for "reading and
writing physical quantities
". It effectively adds units as an
attribute to Python
float values so that they can be used when converting the value to
a string. But he said that it might make sense to incorporate scale factors and units
into Python itself for readability purposes:
For example, consider the following three versions of the same line of code:virt /= 1048576 virt /= 1.048576e6 virt /= 1MiBThe last is the easiest to read and the least ambiguous. Using the units and scale factor on the scaling constant results in an easy to read line that makes it clear what is intended.Notice that in this case the program does not use the specified units, rather the act of specifying the units clarifies the programmers intent and reduces the chance of misunderstandings or error when the code is modified by later programmers.
But this suggests that it is not necessary for Python to interpret the units. The most it needs do is to save the units as an attribute so that it is available if needed later.
Library deficiencies?
Beyond just Pint and QuantiPhy, the units module was also mentioned in the
thread, so Paul Moore wondered
why none of those solutions was acceptable. He pointed out that the
@ matrix-multiplication operator was added to the language because
of arguments from the NumPy community; "language
changes are *more likely* based on a thriving community of library
users, so starting with a library is a positive way of arguing for
core changes
". Turnbull echoed that
and also wondered if the typing features could be harnessed to help solve
the problem.
In a lengthy
message, McCall tried to answer Moore's question, though it is not clear
that he really changed any minds. He laid out a complicated calculation,
with many different units, and showed how it looked using various existing
libraries and the syntax proposed by Angelico; for each he listed a set of
pros and cons. One could perhaps quibble with his analysis, but that is
not really the point, Moore said;
what it shows is that "the
existing library solutions might not be ideal, but they do broadly
address the requirement
". Each has its own pain points, so:
Maybe that suggests that there's room for a unified library that takes the best ideas from all of the existing ones, and pulls them together into something that subject experts like yourself *would* be happy with (within the constraints of the existing language). And if new syntax is a clear win even with such a library, then designing a language feature that enables better syntax for that library would still be possible (and there would be a clear use case for it, making the arguments easier to make).
A somewhat late entrant into the syntax derby (though others had shown similar constructs along the way) came from Matt del Valle who suggested that the numeric types (e.g. int, float) could gain units by way of Python's subscript notation, which might look something like:
from units.si import km, m, N, Pa 3[km] + 4[m] == 3004[m] # True 5[N]/1[m**2] == 5[Pa] # True
Moore thought that looked like a plausible syntax, but reiterated his belief that any change would necessarily need to come by way of a library that supporters of "units for Python" developed—and rallied around. That can all be done now, without any need for a PEP or core developer support. After that, a language change could be proposed if it made sense to do so:
Once that library has demonstrated its popularity, someone writes a PEP suggesting that the language adds support for the syntax `number[annotation]` that can be customised by user code. This would be very similar in principle to the PEP for the matrix multiplication @ operator - a popular 3rd party library demonstrates that a well-focused language change, designed to be generally useful, can significantly improve the UI of the library in a way which would be natural for that library's users (while still being general enough to allow others to experiment with the feature as well).[...] But the library would be useful even if this doesn't happen (and conversely, if the library proves *not* to be useful, it demonstrates that the language change wouldn't actually be as valuable as people had hoped).
[...] So honestly, I'd encourage interested users to get on with implementing the library of their dreams. By all means look ahead to how language syntax improvements might help you, but don't let that stop you getting something useful working right now.
Those who want to try out different syntax changes without actually having to hack on the CPython interpreter directly may be interested in the ideas library. André Roberge, who developed the library, suggested using it as a way to prototype the changes. Ideas modifies the abstract syntax tree (AST) on the fly to enable changes to the input before handing it off to CPython. In another message, he noted that he had implemented the subscript notation for ideas so that it could be tested using Pint or astropy.units.
So far at least, it does not seem like there is a groundswell of activity toward yet another library for units, but one focused in the way that Moore suggested could lead to changes to the language. It may be that the disparate ideas of what unit support would actually mean—and how it would be used—make it hard to coalesce around a single solution. It may also be that the need for additional solutions for Python unit handling is not as pressing as some think. It seems likely that the idea will crop up again, however, so proponents may well want to consider Moore's advice and come up with a unified library before pursuing language changes.
Posted Jul 12, 2022 22:34 UTC (Tue)
by logang (subscriber, #127618)
[Link]
What I'd really like to see is simple a standard library way to print and parse both SI and binary prefixes.
I feel like I've had to rewrite that kind of things dozens of times -- I can never justify adding a dependency for that, especially for simple scripts that are meant to be portable.
So many command line tools support taking sizes with a k/M/G suffix, it would be nice if argparse just had that option built-in as well as an easy way to print numbers with suffixes using the .format syntax.
Posted Jul 13, 2022 0:23 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jul 13, 2022 4:46 UTC (Wed)
by rsidd (subscriber, #2582)
[Link] (23 responses)
This is disappointing sloppiness from LWN. Metre, centimetre, kilometre, gram, milligram, kilogram are all part of the metric system. SI units are specifically metre for length, kilogram for weight, second for time (second isn't part of the metric system), etc.
Posted Jul 13, 2022 5:28 UTC (Wed)
by k8to (guest, #15413)
[Link]
Posted Jul 13, 2022 8:01 UTC (Wed)
by pbonzini (subscriber, #60935)
[Link] (19 responses)
> second isn't part of the metric system Are you sure? My understanding is that metric systems are systems like MKS, CGS etc., all of which include the second. The French revolution indeed introduced only the meter and the kilogram, but that's because the second was in use even before. In fact the first attempt of the definition of a meter was "the length of a pendulum that oscillates every second".
Posted Jul 13, 2022 8:24 UTC (Wed)
by smurf (subscriber, #17840)
[Link] (18 responses)
The problem with that idea was that meters and kilograms solved a real problem – it is impossible to determine how many different pounds and feet, or whatever they were called, existed at the time –, while everybody (… for some arbitrary value of "everybody" of course …) already agreed that the day has 24h / 60min / 60sec. Hence that attempt died.
Posted Jul 13, 2022 13:33 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link] (17 responses)
Whether these changes work or not comes down to popularity. Time changes involved considerable infrastructure overhaul which the Revolutionaries could not get done. So you get a dispensation, OK, your town is down with the Republican agenda but you've got this massive town clock keeping time the old way, ordering you to dismantle the clock would turn you against us when we can't handle any more enemies, so, keep your clock "for now" and just do the other stuff nearer the top of our list. A year later, the Republic is looking a bit wobblier, your clock is still standing, and "Replace your clock" isn't even on the list any more.
For the calendar, you've got a problem because the Republican calendar requires the same trick as Lunar calendars popular in Islam and Judaism - somebody has to go look at the sky to check whether it's next year yet. Some people will insist that what matters is whether the appointed astronomer saw the requisite signs, others are content to rely on a model of the universe. The astronomer might not notice, and the model could be wrong. It's a mess.
What we do today is pretty stupid (the IERS astronomers add or remove seconds from UTC to approximate UT1, even though moment by moment we actually track TAI because unlike the Earth's spin it isn't varying) but still less so than the Republican calendar.
The Republicans also tried ten day weeks which blew up for a different reason. Workers used to a six on/ one off pattern were now given nine on/ one off. The revolutionaries said aha, but we give you half of the fifth day extra, and that's better - but the sort of person who can do arithmetic well enough to confirm that yes, strictly 1.5/10 is better than 1/7 is not usually a manual labourer, *half* a day off doesn't feel much like a day off at all, and so it lasted only a few weeks (of either kind) before being abandoned as unworkable.
Posted Jul 13, 2022 16:57 UTC (Wed)
by rgmoore (✭ supporter ✭, #75)
[Link] (5 responses)
The month names are specific to the climate of France; I think they're actually specific to the climate of Paris, so people in Provence would disagree about, e.g. whether Nivôse was actually snowy. They certainly have nothing to do with the seasons here in Southern California, and I assume anyone in the Southern Hemisphere would object strenuously.
Posted Jul 13, 2022 17:03 UTC (Wed)
by pbonzini (subscriber, #60935)
[Link] (4 responses)
Posted Jul 13, 2022 20:35 UTC (Wed)
by rgmoore (✭ supporter ✭, #75)
[Link] (3 responses)
Some of the Roman month names came from the same kind of naming scheme. For example, March was the month devoted to Mars, the god of war, and it was the traditional start to the military campaigning season. June was named for Juno who, among other things, was the goddess of marriage, and June is still the most popular month for weddings. But if you want your calendar to be a truly universal one, it's probably better not to give your months names that are tightly bound to the seasonal happenings in the country where it started. If it does go on to be universal, it will be because those original meanings have become lost over time and people treat the names as if they're arbitrary.
Relating back to the units, thing, one can think of the names of units. It's kind of fun that units are often named after scientists who did important work that's related to the thing they're measuring- Watt for power, Ampere for current, Kelvin for temperature, etc.- but plenty of people can use the units just fine without knowing that, say, Henri Becquerel is credited with discovering radioactivity. That's why we can mix units named for people with ones named for other things- meters, moles, and seconds- without too much problem, though it helps to understand why some are capitalized and others not.
Posted Jul 13, 2022 21:02 UTC (Wed)
by excors (subscriber, #95769)
[Link] (2 responses)
I think your examples are wrong (although I might be misinterpreting your intent) - none of the unit names should be capitalised when written in full, so it should be watt, ampere, kelvin, becquerel, etc. (Except for "degrees Celsius" which should keep its C). It's the abbreviations that use capitals iff they're named after a person (W, A, K, Bq, etc) (except for litres which can be L or l for typographic reasons). (See e.g. https://www.bipm.org/utils/common/pdf/si-brochure/SI-Broc... section 5.2, 5.3)
Posted Jul 14, 2022 18:15 UTC (Thu)
by kpfleming (subscriber, #23250)
[Link] (1 responses)
Posted Jul 16, 2022 9:33 UTC (Sat)
by edeloget (subscriber, #88392)
[Link]
"... and to finish your night with a rapid drink, here is a small pool of 4 megaliters of vodka..."
Posted Jul 13, 2022 21:33 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (10 responses)
Well, it's for exactly the same reason that the UK tax year is the 6th April. Until only 250 years ago, the 6th April (actually I think it's the 7th now) WAS New Year's Day, and December WAS the tenth month. We have Pope Gregory to thank for the modern mess.
250 years isn't really long enough to get it into peoples' heads that the name no longer reflects reality (actually, names rarely reflect reality at all :-)
Cheers,
Posted Jul 13, 2022 23:09 UTC (Wed)
by rgmoore (✭ supporter ✭, #75)
[Link] (5 responses)
It's getting pretty deep in the weeds, but the Roman calendar started the year on 1 January even before Julius Caesar reformed it into the Julian system. So even though they had 12 months, the Romans called the last month in the calendar year 10th Month. Supposedly their original calendar had only 10 months, with a period of about 50 days that weren't part of any month, and January and February were added later. It's possible that the calendar originally started in March, with January and February being the 11th and 12th months, and it was only later switched to start in January, but the civil year starting in January was well established before the Julian reforms.
Posted Jul 14, 2022 7:27 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (4 responses)
Of course, there had to be a good reason for Gregory to change the New Year, but ...
Why (and I've seen evidence of this) are many dates between 1550 and 1750 "double year" dates - when the two calendars ran in parallel - but there's no evidence of this before or after?
For evidence, Pepys diaries, and Derby Cathedral - I was a bit shocked at that, I saw a monument and it took me a while to realise *why* the date - something like "20 January 1740/41" - read like that.
Cheers,
Posted Jul 14, 2022 10:53 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link] (1 responses)
Posted Jul 16, 2022 13:03 UTC (Sat)
by Wol (subscriber, #4433)
[Link]
The reason they used both years between 1582 and 1752, was because Europe was on the New System, and we were on the old, so as soon as anything showed any hint of Internationalism you had to make sure it was clear which system you meant.
(Oh - as for the Roman calendar, January is named after Janus, the Roman God who looked both backwards at the old year, and forward to the new. So March was the first month of the year, December was the last, and yes what is now Jan and Feb was "winter".
Cheers,
Posted Jul 15, 2022 22:55 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Jul 16, 2022 8:05 UTC (Sat)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Jul 14, 2022 9:14 UTC (Thu)
by james (subscriber, #1325)
[Link] (1 responses)
When Britain adopted the Gregorian calendar in 1752, eleven days were "lost" to the calendar. The lawyers saw no reason why the tax year should remain fixed to the Church calendar, but some very good ones why the tax year should remain 365 or 366 days long. That moved it to 5 April.
Then 1800 would have been a leap year under the old Julian calendar, but not under the Gregorian calendar, and the Treasury pushed the start of the tax year to 6 April, to match Lady Day in the old Julian calendar.
Posted Jul 18, 2022 15:21 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
I was a bit vague (as always) about 5th/6th/7th April being the Julian 25th March, but I didn't want to overcomplicate by spelling it all out ... and in another 80(ish) years it'll be the 8th April :-) Although I doubt the Treasury will bother moving the tax year.
Cheers,
Posted Jul 14, 2022 9:36 UTC (Thu)
by nowster (subscriber, #67)
[Link]
https://theconversation.com/why-the-uk-tax-year-begins-on...
The year number used to change on March 25th too, which makes historic date handling rather complex.
Posted Jul 14, 2022 12:27 UTC (Thu)
by nsheed (subscriber, #5151)
[Link]
is still well understood in my part of the world (N.E. Scotland).
Posted Jul 13, 2022 15:12 UTC (Wed)
by intelfx (subscriber, #130118)
[Link] (1 responses)
But the SI units system is indeed *often referred to* as the metric system. Even if it is not technically correct. So there's no sloppiness, the article says exactly what it wanted to say.
Posted Jul 13, 2022 16:17 UTC (Wed)
by pbonzini (subscriber, #60935)
[Link]
Posted Jul 13, 2022 5:40 UTC (Wed)
by vulpicastor (subscriber, #122452)
[Link]
The Astropy package, popular in the astronomy community,
has an excellent and robust subpackage
for dealing with units.
It seems to me that it’s not too much of an imposition to write
Custom (numeric) literals doesn’t fully solve the problem of giving units to arrays, which is essential for many numerical computations.
Having different syntax for literals with units and arrays with units seems inelegant.
In addition,
Astropy overloads the The composition of units is naturally done through the
I am sure there are other contexts that only requires simple units with primitive numeric type where the custom literal approach might work out. But the approaches mentioned in the article, for reasons detailed above, seem inadequate for the current needs of scientific numerical computing.
Posted Jul 13, 2022 6:05 UTC (Wed)
by smurf (subscriber, #17840)
[Link] (20 responses)
There's some precedent for that, e.g. the way floats are handled (rounding, errors etc.).
Posted Jul 13, 2022 6:43 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (19 responses)
Posted Jul 14, 2022 7:30 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (18 responses)
Officially, we now have the Ki, Mi, Gi prefixes to mean base 2. Which of course now causes its own problems when I believe the official definition of "a gigabyte of disk" is now an MKiB?
Cheers,
Posted Jul 14, 2022 10:09 UTC (Thu)
by geert (subscriber, #98403)
[Link] (17 responses)
The packaging of the last hard drive I bought says "When referring to drive capacity, one gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one thousand billion bytes." (some online variant used "trillion" instead of "thousand billion").
We can still discuss about the meaning of "billion" and "trillion" ;-)
Posted Jul 14, 2022 15:32 UTC (Thu)
by esemwy (guest, #83963)
[Link] (4 responses)
Posted Jul 28, 2022 16:35 UTC (Thu)
by sammythesnake (guest, #17693)
[Link] (3 responses)
Million: 1,000,000
I prefer this style for the cool extra words, because it feels fractionally logical to my brain, but also because it gives a lot more headroom before running out of names I can remember :-D
Sadly, the "short scale" (with billion=1000 million, trillion=1000 billion etc.) is very definitely winning, partly because of US culture being so influential internationally, but also because it's the norm in scientific usage.
Now that I think of it, I wonder how we ended up with two conventions on this space in the first place...
Posted Jul 28, 2022 17:08 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (2 responses)
And I've never heard of your long scale, to me a billion was a million^2, a trillion was a billion^2. Easily described, you can have a million billion no problem ... (apart from it being a huge amount of whatever :-)
Cheers,
Posted Jul 28, 2022 18:13 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link]
https://en.wikipedia.org/wiki/Long_and_short_scales#Long_...
American English has been using short scale since before the USA was a country.
France, bizarrely, switched from short scale to long scale in the 20th century (this being officially confirmed in 1961).
British official usage was declared to be short scale in 1974, on the occasion of the Tory member for Tiverton asking Harold Wilson if he was going to affirm British official usage to be long scale.
Posted Jul 30, 2022 15:39 UTC (Sat)
by smurf (subscriber, #17840)
[Link]
Posted Jul 14, 2022 18:23 UTC (Thu)
by anselm (subscriber, #2796)
[Link]
Of course it does – it makes the drive seem bigger! At least to people who naïvely assume that one terabyte is 240 bytes (what SI calls “one tebibyte”).
Posted Jul 18, 2022 15:26 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
(Certainly when fdisk tells me how big my 4TB hard drive is, it comes out at rather more than 4 billion (that is 4x(10^6)^2 :-)
Cheers,
Posted Jul 19, 2022 22:50 UTC (Tue)
by nybble41 (subscriber, #55106)
[Link] (9 responses)
RAM manufacturers, on the other hand, stuck mostly to binary sizes since RAM modules scale based on the number of address and data lines. The exception would be the modules with an odd number of data lines intended for parity or ECC bits... but the usable space after ECC is still generally a power of two.
The SI unit for information is the *bit*. Insisting on the SI definition of "kilo" with *bytes* as the base unit makes no sense; in pure SI terms you're measuring in multiples of 8,000 (or 8,000,000 etc.) SI base units, not powers of 1,000. The prefixes used for SI units can have other meanings in different contexts; no one insists that a microservice must be exactly one-millionth of a service, for example.
Unfortunately this has been muddied to the point that a simple "KB" or "kilobyte" can never again be considered unambiguous, so when precision matters I suggest using "KiB" for "binary kilobyte" or "KeB" for "decimal kilobyte". Forget about "kibibyte"; that just sounds stupid. (But if you insist-- the decimal equivalent can only be "kedebyte".) Or you can measure the data in bits rather then bytes, with an unambiguous SI decimal prefix.
Posted Jul 20, 2022 7:00 UTC (Wed)
by geert (subscriber, #98403)
[Link] (8 responses)
Posted Jul 22, 2022 1:38 UTC (Fri)
by nybble41 (subscriber, #55106)
[Link] (7 responses)
Within the SI system, sure. But as I said, bytes are not an SI unit, so SI prefixes do not apply. In another context the "kilo" prefix can easily mean something else entirely—even 1024.
Practically speaking, "kB" or "kilobyte" means either 1024 bytes (the traditional version dating back to the early days of binary computers, and an integer power of two in bits) or 1000 bytes (the new version coerced into the ill-fitting SI system, mixing base-2 and base-10 to arrive at 8,000 bits). A reader can't tell which you meant, so if the difference between 1024 and 1000 matters at all then you should avoid the term altogether. I only offered an unambiguous alternative modeled on the KiB / "kibibyte" nomenclature. It's not SI but it does a far better job of communicating the intent.
If you want to stick with SI, don't talk about bytes. The SI base unit for information is the bit.
Posted Jul 22, 2022 2:49 UTC (Fri)
by rschroev (subscriber, #4164)
[Link]
Kilo meaning 1024 is the new version invented in the early days of binary computers, already wrong and in conflict with both existing standards and the Greek word χίλιοι (chilioi) it's derived from literally meaning 1000.
Kilo meaning 1000 is the traditional version, consistent with existing usage dating back to the end of the 18th century, long before the rise of binary computers; and consistent with the Greek word going back a few thousand years further still.
Posted Jul 30, 2022 4:16 UTC (Sat)
by JanC_ (guest, #34940)
[Link] (5 responses)
The only problem with using bytes is that the size of a byte is not fixed (it is hardware-dependent), so you have to specify somewhere what size the bytes you are talking about are…
It would have been much better if English language computer engineers had used 'octet' for “modern” 8-bit bytes instead (as the French do).
Posted Jul 30, 2022 9:27 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link] (3 responses)
Outside of what are now very niche contexts, this is not a serious concern.
Posted Jul 30, 2022 11:36 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (2 responses)
Niche contexts ... like networking?
I was always under the impression that you can't divide your networking kb by 8 to get your data transfer kB, because an 8-bit data byte is about a 10-bit network byte ...
(or is it because the b in networking stands for baud which is most definitely not a bit ...)
Cheers,
Posted Jul 30, 2022 13:47 UTC (Sat)
by pizza (subscriber, #46)
[Link]
That's still a good rule of thumb, as when you factor in network/protocol overhead, it works out pretty consistently:
10Mbps =~ 1MB/s, 100Mbps =~ 10MB/s, 1000MBps =~ 100MB/s
(Over 1Gbps it tends to fall off somewhat; for example the most I recall getting using 10Gbps fiber (and 9K jumbo frames) was about 550MB/s, though that was probably CPU bound as I was using 'scp')
Posted Jul 30, 2022 19:29 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link]
It depends on where the network technology's speed rating is measured.
For example, 100BASE-TX's rated speed of 100 Mbit is measured in terms of the 25MHz 4-bit parallel data stream fed to the MII, not the 125 MHz run-length-limited serial data stream the 4b5b encoder behind the MII feeds to the MLT-3 encoder that generates the three-level waveform seen on the wire.
Of course, 100 Mb/s of packet data transfer doesn't translate into 12.5 MB/s of actual application data transfer, because of the protocol overheads imposed by various layers.
Posted Jul 30, 2022 15:42 UTC (Sat)
by smurf (subscriber, #17840)
[Link]
Posted Jul 13, 2022 7:15 UTC (Wed)
by Karellen (subscriber, #67644)
[Link] (14 responses)
Huh. I did not read 10e1j as indeed meaning (10e1)j. Rather, I read it as 10e(1j).
Posted Jul 13, 2022 7:50 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (10 responses)
Posted Jul 13, 2022 11:55 UTC (Wed)
by willy (subscriber, #9762)
[Link]
Posted Jul 13, 2022 17:05 UTC (Wed)
by Karellen (subscriber, #67644)
[Link] (8 responses)
I wonder, if it was written "10×10^1j", where would the precedence of the "j" go? Lower than "×"? Or equivalent but still applied last because of its rightmost position?
"Imaginary part" is always forgotten on the operation order-of-precedence lists. I wonder if it feels left out! :-)
Posted Jul 13, 2022 19:38 UTC (Wed)
by dtlin (subscriber, #36537)
[Link] (7 responses)
I don't really see the need for a postfix unit syntax. Using the usual * and / operators generalizes to stuff like "ft·lb" and "m/s" which would be awkward to express as single tokens. A library (such as Pint in the article or astropy.units mentioned below) with appropriate operator overloads should be enough.
Posted Jul 14, 2022 1:29 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (6 responses)
In Python (not math), the notation 10e1 is *not* a combined-multiplication-and-exponentiation operator applied to the arguments 10 and 1. Rather, it is another way of writing the float literal 100.0. Similarly, in Python, the j suffix is not a multiply-by-𝑖 postfix operator. Rather, it is the suffix for an imaginary literal. You cannot write some_varj and expect it to multiply some_var by 𝑖, nor can you write some_vare1 and expect it to multiply some_var by 10, because in both cases, the syntax is meant for constructing literals. This also means that both of these syntaxes have the highest possible precedence, since they're not proper operations at all, they're just literal notation. This cannot even be overridden by parentheses. 10e(1j) is a syntax error, not 10 × 10^𝑖. The latter can only be written explicitly using the * and ** operators (or equivalent functions from the cmath module).
To be even more pedantic: The imaginary literal syntax consists of a (real) float literal followed by the j suffix. So you have to parse a real literal before you can even deal with parsing an imaginary literal, and that's why the j has to bind less tightly than the e. If you had instead written 10 * 10 ** 1j, then the opposite would happen, and you really would get 10 × 10^𝑖 (i.e. the j binds more tightly than the exponent operator, contrary to PEMDAS and similar rules).
Posted Jul 30, 2022 16:53 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (5 responses)
Ten SQUARED is a hundred - 10e2.
Cheers,
Posted Jul 30, 2022 17:57 UTC (Sat)
by anselm (subscriber, #2796)
[Link] (4 responses)
Posted Jul 30, 2022 20:34 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (3 responses)
But 10e1 is rather odd notation - 10 x 10^1
I'm used to either scientific notation where it's en and n is a multiple of 3, or (dunno what it's called) where 1 <= mantissa < 10.
( I know some people prefer the mantissa between 0 and 1, rather than 1 and 10, but the combination of mantissa not between 0 and ten, and exponent not a multiple of 3, is, well, weird!)
Cheers,
Posted Jul 30, 2022 23:51 UTC (Sat)
by rschroev (subscriber, #4164)
[Link] (1 responses)
When communicating with other people it's probably best to stick with either scientific notation in the strict sense (1 <= mantissa < 10) or engineering notation (exponent is a multiple of 3, 1 <= mantissa < 1000), but other representations are just as valid and programming languages don't care in the least how you scale the exponent.
Posted Jul 31, 2022 7:50 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
But brains usually do ... :-)
Cheers,
Posted Jul 31, 2022 17:21 UTC (Sun)
by anselm (subscriber, #2796)
[Link]
Scientific pocket calculators (are these still a thing?) used to call the “1 ≤ mantissa < 10 and arbitrary exponent” style “scientific” and the “1 ≤ mantissa < 1000, exponent a multiple of 3” style “engineering”.
Posted Jul 22, 2022 9:54 UTC (Fri)
by Darkstar (guest, #28767)
[Link] (2 responses)
Posted Jul 22, 2022 13:58 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
"j" is used for the imaginary unit instead of "i" in any field where "i" would be potentially confusing. Electrical engineering, for example, uses "I" for currents, and so talking about an AC current of I = (5 + 3i) / (4 + 2i) has room for confusion when spoken, where I = (5 + 3j) / (4 + 2j) does not.
Posted Jul 23, 2022 7:02 UTC (Sat)
by smurf (subscriber, #17840)
[Link]
Also, well, Mathematicians use i while engineering uses j because i is either current or an index. Python had to choose one, so Guido picked j.
See also https://bugs.python.org/issue10562
Posted Jul 13, 2022 14:55 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link]
d = 3.2metre + 6.1inch + 8mile
... implies the existence of a distance type, for which metre, inch and mile are just constant factors, otherwise why shan't I write:
x = 1mile + 1kilogram + 1hour # What does this mean?
In a language like C++ we can assume that 3.2metre has some type my::distance and so if we find a way to write 4mile, either that's also a my::distance (and so this actually does what we wanted) or else it's some::other::property and we can't add them together so we know we need an adaptor. There's no risk that oops, 3.2metre + 4mile = 9.2inch and our clever unit system actually made things more confusing not less.
Posted Jul 13, 2022 17:10 UTC (Wed)
by xyz (subscriber, #504)
[Link]
A Physical Units Library For the Next C++ - Mateusz Pusz - CppCon 2020
Posted Jul 13, 2022 19:57 UTC (Wed)
by nickodell (subscriber, #125165)
[Link] (2 responses)
I was curious why this happens, so I took a look at the Pint source code. It seems that Pint defines the furlong in terms of the survey foot, which is defined as exactly 1200/3937 meters, while the normal foot is defined as exactly 0.3048 meters.
>>> (1 * ureg.furlongs / ureg.survey_foot).to('')
Posted Jul 14, 2022 9:57 UTC (Thu)
by nowster (subscriber, #67)
[Link] (1 responses)
And then we come on to volume measures...
Posted Jul 15, 2022 18:46 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link]
Posted Jul 13, 2022 20:22 UTC (Wed)
by mb (subscriber, #50428)
[Link]
8@u_kOhm
That's kind of useful and easier to read than plain literals. But I'd still prefer something like
8_kOhm
Posted Jul 13, 2022 21:43 UTC (Wed)
by pj (subscriber, #4506)
[Link] (7 responses)
Posted Jul 14, 2022 0:56 UTC (Thu)
by k8to (guest, #15413)
[Link] (4 responses)
Posted Jul 14, 2022 7:37 UTC (Thu)
by taladar (subscriber, #68407)
[Link] (3 responses)
Posted Jul 14, 2022 8:07 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (2 responses)
That's an important point. There are two uses of units in a program:
The first is very Pythonic in nature - the second amounts to a static type system for units, and I'm not sure how that fits in with the dynamic typing Python prefers.
Posted Jul 15, 2022 19:09 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link]
The problem with the existing support is that, for example, [length] + [length] should be of type [length], but if you try to do that with NewType, you just get back float or int (types aren't preserved across binary operations). That's fixable for addition and subtraction relatively easily (define and type hint appropriate dunder methods), but it would be more complicated in the case of multiplication and division, because Python's generics are insufficiently advanced to support e.g. [length] / [time] = [speed]. You could hard-code all of those conversions one at a time, and perhaps generate the code somehow, but a lot of fundamental constants have weird units that wouldn't necessarily have a "standard" interpretation, such as the Boltzmann constant ([length]^2 * [mass] * [time]^-2 * [temperature]^-1), and it would not be fun to hard code those as well.
Ideally, a unit library should allow you to multiply and divide whatever by whatever and just make sense of it, but that would require the ability to put non-type arguments into Python's generics. Then we could express the Boltzmann constant's type as something like Quantity[2, 1, -2, 0, -1, 0, 0], where each number indicates the exponent of a given unit. Right now, Python does not let you do that (to the best of my understanding), because the arguments have to be types, not integers.
However, this also raises more philosophical problems. There are seven base units in the SI system, so one might assume that Quantity should have a fixed arity of seven exponents. However, the SI system doesn't cover many commonly-used units, such as bytes. There are also logarithmic units such as the bel (or decibel) and the cent (used in music theory, not to be confused with various currencies), which are subject to more complicated coherence rules than typical (linear) units. And the radian is officially a derived unit of the form m/m (so that the equation s=rθ) is valid, but that would just simplify to dimensionless unless you special-case it somehow (the steradian has the same problem).
Posted Jul 24, 2022 23:41 UTC (Sun)
by pdundas (guest, #15203)
[Link]
Interestingly you can *multiply* or *divide* distance and time, giving a quantity with a complex unit - speed might be 5m/30s (or 10m/minute, or some other number in furlongs per microfortnight). All kinds of weird and wonderful composite units are available - as some posters mentioned earlier. Or consider electrical units - Amps are Coulombs per second, Volts are Joules per Coulomb, and Watts are Joules per second - or something like that - it's been a while. Which raises interesting possibilities for *display* of numbers with units attached, when they need to be scaled, or converted between families of units that measure the same thing, or expressed (as for Current) in a particular way.
As for how to do that in Pythin, I've no idea. But it's a fascinating problem.
Posted Jul 14, 2022 1:10 UTC (Thu)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
In contrast if I have 0.0508 metres of something I also have 2 inches of it, that's a conversion, I don't have less of it or more of it, I just changed my units of measurement.
The thornier case is situations where arguably there is a conversion, but pragmatically that's never going to be what you meant. If I have a kilogram, and I want joules, I probably need to re-examine my priors rather than use Einstein's famous equation which gives a very large number indeed for this conversion.
Posted Jul 30, 2022 17:00 UTC (Sat)
by Wol (subscriber, #4433)
[Link]
??? Isn't that exactly what the guy on the other side of your trade just DID?
Whether the transaction is reversible has nothing to do with the transaction, and everything to do with whether the parties are trading or buying/selling.
Cheers,
Posted Jul 14, 2022 7:27 UTC (Thu)
by niner (subscriber, #26151)
[Link] (4 responses)
use Physics::Measure :ALL;
# Define a distance and a time
# Calculate speed and acceleration
As a special treat it allows working with measurement errors:
my \d = 10m ± 1;
More examples (including how to use non-SI units) are to be found in the docs at https://github.com/p6steve/raku-Physics-Measure
All of this is lexically scoped, like almost everything in Raku. While the module itself does not export postfix operators for non-SI units, it would be easy for another module to provide those (they are really just small wrappers around object constructors). Thanks to lexical scope, different definitions of "mile", etc. wouldn't be a problem.
All of this comes from the power of supporting custom operators in the language. Python already supports overloading operators via magic methods like __add__. All it would need would be the possibility to add new operators. Postfix could give you units, infix operators even those nifty measurement errors. Sadly, I would bet that Python will just not get that support, as it would require making the language grammar extensible and so far ease of parsing has always been a design priority for Python.
Posted Jul 15, 2022 21:00 UTC (Fri)
by jrwren (subscriber, #97799)
[Link] (2 responses)
Only if it is implemented poorly. SI clearly defines K as Kelvins, period. k is kilo and ki is kibi (although maybe not SI)
https://www.nist.gov/pml/owm/metric-si-prefixes
Posted Jul 18, 2022 14:44 UTC (Mon)
by jwilk (subscriber, #63328)
[Link]
Posted Jul 19, 2022 20:54 UTC (Tue)
by p6steve (guest, #159775)
[Link]
say 29K.in('°C'); # -244.15°C
Sadly, Kibi (and other computing units are not yet implemented)
Posted Jul 17, 2022 19:34 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link]
(Yes, Raku is not Perl, but it's close enough for the comparison to still be relevant.)
Posted Jul 22, 2022 8:27 UTC (Fri)
by callegar (guest, #16148)
[Link]
Native Python support for units?
Native Python support for units?
SI ≠ metric
SI ≠ metric
SI ≠ metric
SI ≠ metric
French Republican time and dates
French Republican time and dates
French Republican calendar seems better, at least the names and regularity of month length - Fructidor in particular appeals to me.
French Republican time and dates
French Republican time and dates
French Republican time and dates
French Republican time and dates
French Republican time and dates
French Republican time and dates
Wol
French Republican time and dates
French Republican time and dates
Wol
French Republican time and dates
French Republican time and dates
Wol
French Republican time and dates
French Republican time and dates
Wol
Actually, before Britain adopted the Gregorian calendar, the beginning of the tax and legal year was Lady Day (25 March, the Church feast celebrating the angel Gabriel telling the Virgin Mary that she was to conceive Jesus: nine months before Christmas).French Republican time and dates
French Republican time and dates
Wol
French Republican time and dates
French Republican time and dates
SI ≠ metric
SI ≠ metric
Native Python support for units?
1 * u.cm
rather than 1cm
,
after one writes import astropy.units as u
.
Other than the one-letter variable u
, it avoids the namespacing problem of having units implemented by incompatible libraries.
There are also features provided by astropy.units
that is imo difficult or impossible to express with custom literals.
<<
operator to express the idea of attaching units to an array without copying,
so my_array << u.kg
is much faster than my_array * u.kg
for large arrays.
For the same reason, even for an array with literal values given in code,
it might make more sense to write
numpy.array([1, 2, 3]) << u.kg
,
which results in an implementation-defined wrapper class around the Numpy array with units,
than numpy.array([1kg, 2kg, 3kg])
,
which results in a Numpy array of object pointers to quantities with units.
*
, /
, and **
operators.
The unit of the Newtonian gravitational constant, for example, is
kg-1 m3 s-2
in base SI units.
This kind of semantic seems difficult to express flexibly and extensibly using a custom literal system,
unless one defines a whole new syntax for composing literal suffixes that essentially duplicates the three
*
, /
, and **
operators anyway.
This is made even worse by the possibility of fractional powers of units.
An example would be the unit of electrical charge in the non-SI Gaussian cgs system, which can be expressed as
cm3/2 g1/2 s−1.
Contexts
Contexts
Contexts
Wol
Contexts
Contexts
Contexts
Milliard: 1,000 million
Billion: 1,000 milliard (a million million - a "Bi-illon")
Billiard: 1,000 billion
Trillion: 1,000 billiard (a million million million - a "tri-illion)
Trilliard: 1,000 trillion etc.
Contexts
Wol
Contexts
"Winning" long vs. short scale
Contexts
The packaging of the last hard drive I bought says "When referring to drive capacity, one gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one thousand billion bytes."
Contexts
Wol
Contexts
Contexts
Contexts
Contexts
Contexts
Contexts
Contexts
Wol
Contexts
Contexts
Contexts
Native Python support for units?
(They can of course be combined, 10e1j does indeed mean 100*sqrt(-1).)
Native Python support for units?
Native Python support for units?
Native Python support for units?
Native Python support for units?
Native Python support for units?
Native Python support for units?
Wol
Native Python support for units?
Don't you mean 10e1 is 10.0 - 10 to the power 1?
$ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 1e1
10.0
>>> 1e2
100.0
>>> 10e1
100.0
>>>
Native Python support for units?
Wol
Native Python support for units?
Native Python support for units?
Wol
Native Python support for units?
I'm used to either scientific notation where it's en and n is a multiple of 3, or (dunno what it's called) where 1 ≤ mantissa < 10.
Native Python support for units?
Native Python support for units?
Native Python support for units?
Units without types?
Native Python support for units?
What I found interesting was the way as the authors discussed the challenges and pitfalls from this topic:
https://youtu.be/7dExYGSOJzo?list=RDCMUCMlGfpWw-RUdWX_JbL...
Furlongs
<Quantity(660.0, 'dimensionless')>
Furlongs
Furlongs
Native Python support for units?
22@u_nF
22_nF
Native Python support for units?
Native Python support for units?
Native Python support for units?
Native Python support for units?
Native Python support for units?
Native Python support for units?
Native Python support for units?
Native Python support for units?
Wol
Native Python support for units?
my \d = 42m; say d; #42 m (Length)
my \t = 10s; say t; #10 s (Time)
my \u = d / t; say u; #4.2 m/s (Speed)
my \a = u / t; say a; #0.42 m/s^2 (Acceleration)
my \t = 8s ± 2;
say d / t # 1.25m/s ±0.4
Native Python support for units?
Native Python support for units?
Native Python support for units?
#viz. https://en.wikipedia.org/wiki/International_System_of_Units
say 29km.in('miles'); # 18.019765mile
Native Python support for units?
I may be missing something, but isn't this all syntactic sugar to save a "*" that is probably better left in place? The class system and the operator overloading mechanism has already proven to be powerful enough to define a `Quantity` class, so you can write `12*mm` and that does not seem particularly less readable to me than `12mm` or `12_mm` or `1[mm]`. The use of `*` seems also to be the appropriate thing: when you write `12mm` you really mean `12*mm` (the unit taken twelve times). Why `12*mm` should be worse than `12[mm]` (one char longer to type) or `12_mm`? In fact the explicit usage of the `*` shows its advantages when you start speaking of compound units like `12*(m/s)` that seems more readable and better parseable than `12_m/s`. It is also easier to write `a * mm` than `a*1[mm]` if you want to apply the unit to a number stored in a variable. Incidentally, note that some languages support the concept of implicit multiplication (e.g. Mathematica). If you write `1m` there, it actually means `1 * m`. So if anything, shouldn't the discussion be about cases to allow an implicit multiplication?
Native Python support for units?