english improved

This commit is contained in:
2026-03-05 20:09:16 +01:00
parent c6609d15f5
commit 733fe8c290
21 changed files with 954 additions and 1042 deletions

View File

@@ -31,24 +31,24 @@ for x ∈ ( 3, 3.3e4, Int16(20), Float32(3.3e4), UInt16(9) )
end
```
## Integer Numbers *(integers)*
## Integers
Integer numbers are fundamentally stored as bit patterns of fixed length. Therefore, the value range is finite.
Integers are stored as fixed-length bit patterns. Therefore, the value range is finite.
**Within this value range** addition, subtraction, multiplication, and integer division with remainder
are exact operations without rounding errors.
Integer numbers come in two types: `Signed` (with sign) and `Unsigned`, which can be viewed as machine models for and respectively.
Integer numbers come in two types: `Signed` and `Unsigned`, which can be viewed as machine models for and respectively.
### *Unsigned integers*
```{julia}
subtypes(Unsigned)
```
UInts are binary numbers with n=8, 16, 32, 64, or 128 bits length and the corresponding value range of
`UInts` are binary numbers with a bit width of 8, 16, 32, 64, or 128 and the corresponding value range of
$$
0 \le x < 2^n
$$
They are used relatively rarely in *scientific computing*. In hardware-proximate programming, they are e.g. used for handling binary data and memory addresses. Therefore, Julia displays them by default as hexadecimal numbers (with prefix `0x` and digits `0-9a-f`).
They are used rarely in *scientific computing*. In low-level hardware programming, they are used, e.g., for handling binary data and memory addresses. By default, Julia displays them as hexadecimal numbers (with prefix `0x` and digits `0-9a-f`).
```{julia}
x = 0x0033efef
@@ -74,7 +74,7 @@ In Julia, integer numbers are 64-bit by default:
x = 42
typeof(x)
```
Therefore, they have the value range:
Therefore, they have the value range:
$$
-9.223.372.036.854.775.808 \le x \le 9.223.372.036.854.775.807
$$
@@ -83,7 +83,7 @@ $$
$$
-2.147.483.648 \le x \le 2.147.483.647
$$
The maximum value $2^{31}-1$ is conveniently a Mersenne prime:
By the way, the maximum value $2^{31}-1$ is a Mersenne prime:
```{julia}
using Primes
@@ -94,13 +94,13 @@ Negative numbers are represented in two's complement:
$x \Rightarrow -x$ corresponds to: _flip all bits, then add 1_
This looks like this:
This looks as follows:
::: {.content-visible when-format="html"}
![A representation of the fictional data type `Int4`](../images/Int4.png){width=50%}
:::
::: {.content-visible when-format="pdf"}
::: {.content-visible when-format="typst"}
![A representation of the fictional data type `Int4`](../images/Int4.png){width=50%}
:::
@@ -115,7 +115,7 @@ x = 2^62 - 10 + 2^62
```{julia}
x + 20
```
No error message, no warning! Fixed-length integers do not lie on a line, but on a circle!
No error message, no warning! Fixed-length integers do not lie on a line, but on a **circle.**
:::
@@ -155,12 +155,12 @@ The operations `+`,`-`,`*` have the usual exact arithmetic **modulo $2^{64}$**.
#### Powers `a^b`
- Powers `a^n` are computed exactly modulo $2^{64}$ for natural exponents `n`.
- For negative exponents, the result is a floating-point number.
- For negative exponents, the result is a `Float`.
- `0^0` is [naturally](https://en.wikipedia.org/wiki/Zero_to_the_power_of_zero#cite_note-T4n3B-4) equal to 1.
```{julia}
(-2)^63, 2^64, 3^(-3), 0^0
```
- For natural exponents, [*exponentiation by squaring*](https://de.wikipedia.org/wiki/Bin%C3%A4re_Exponentiation) is used, so for example `x^23` requires only 7 multiplications:
- For natural exponents, [*exponentiation by squaring*](https://en.wikipedia.org/wiki/Exponentiation_by_squaring) is used, so for example `x^23` requires only 7 multiplications:
$$
x^{23} = \left( \left( (x^2)^2 \cdot x \right)^2 \cdot x \right)^2 \cdot x
$$
@@ -176,7 +176,7 @@ x = 40/5
- The functions `div(a,b)`, `rem(a,b)`, and `divrem(a,b)` compute the quotient of integer division, the corresponding remainder, or both as a tuple.
- For `div(a,b)` there is the operator form `a ÷ b` (input: `\div<TAB>`), and for `rem(a,b)` the operator form `a % b`.
- By default, division is "rounded toward zero", so the corresponding remainder has the same sign as the dividend `a`:
- By default, division uses "rounding toward zero", so the corresponding remainder has the same sign as the dividend `a`:
```{julia}
@show divrem( 27, 4)
@@ -185,9 +185,9 @@ x = 40/5
@show ( 27 ÷ -4, 27 % -4);
```
- A rounding rule other than `RoundToZero` can be specified as the third optional argument for the functions.
- A rounding rule other than `RoundToZero` can be specified as the third optional argument for these functions.
- `?RoundingMode` shows the possible rounding modes.
- For the rounding rule `RoundDown` ("toward minus infinity"), so that the corresponding remainder has the same sign as the divisor `b`, there are also the functions `fld(a,b)` *(floored division)* and `mod(a,b)`:
- For the rounding rule `RoundDown` ("toward minus infinity" -- so that the corresponding remainder has the same sign as the divisor `b`), there are also the functions `fld(a,b)` *(floored division)* and `mod(a,b)`:
```{julia}
@show divrem(-27, 4, RoundDown)
@@ -195,14 +195,14 @@ x = 40/5
@show (fld( 27, -4), mod( 27, -4));
```
For all rounding modes holds:
For all rounding modes, the following holds:
```
div(a, b, RoundingMode) * b + rem(a, b, RoundingMode) = a
```
#### The `BigInt` Type
The `BigInt` type allows arbitrary-length integers. The required memory is dynamically allocated.
The `BigInt` type supports arbitrary-precision integers with dynamically allocated memory.
Numeric constants automatically have a sufficiently large type:
@@ -217,7 +217,7 @@ z = 10_000_000_000_000_000_000_000_000_000_000_000_000_000 # 10 sextillion
@show typeof(z);
```
Usually, one must explicitly request the `BigInt` type to avoid modulo $2^{64}$ arithmetic:
In most cases, you must explicitly specify the `BigInt` type to avoid modulo $2^{64}$ arithmetic:
```{julia}
@show 3^300 BigInt(3)^300;
@@ -225,7 +225,7 @@ Usually, one must explicitly request the `BigInt` type to avoid modulo $2^{64}$
*Arbitrary precision arithmetic* comes at a cost of significant memory and computation time.
We compare the time and memory requirements for summing 10 million integers as `Int64` and as `BigInt`.
We compare the time and memory requirements for summing 10 million integers as `Int64` versus `BigInt`.
```{julia}
# 10^7 random numbers, uniformly distributed between -10^7 and 10^7
@@ -235,7 +235,7 @@ vec_int = rand(-10^7:10^7, 10^7)
vec_bigint = BigInt.(vec_int)
```
An initial impression of the time and memory requirements is provided by the `@time` macro:
The `@time` macro provides a rough estimate of the required time and memory:
```{julia}
@time x = sum(vec_int)
@@ -246,7 +246,7 @@ An initial impression of the time and memory requirements is provided by the `@t
@show x typeof(x);
```
Due to Julia's just-in-time compilation, a single execution of a function is not very informative. The `BenchmarkTools` package provides the `@benchmark` macro, which calls a function multiple times and displays the execution times as a histogram.
Due to Julia's just-in-time compilation, timing a single function call is not very informative. The `BenchmarkTools` package provides the `@benchmark` macro, which calls a function multiple times and displays the execution times as a histogram.
:::{.ansitight}
```{julia}
@@ -263,8 +263,8 @@ using BenchmarkTools
The `BigInt` addition is more than 30 times slower.
:::{.content-hidden unless-format="xxx"}
The following function should compute the sum of all numbers from 1 to n using arithmetic of type T.
Due to the *type promotion rules*, it is sufficient for `T ≥ Int64` to initialize the accumulator variable with a number of type T.
The following function computes the sum of all numbers from 1 to n using arithmetic of type T.
Due to *type promotion rules*, it is sufficient to initialize the accumulator with a value of type T (for `T ≥ Int64`).
```{julia}
function mysum(n, T)
s = T(0)
@@ -303,7 +303,7 @@ using BenchmarkTools
```
The computation of $\sum_{n=1}^{10000000} n$ takes on my PC an average of 2 nanoseconds with standard 64-bit integers and over one second in *arbitrary precision arithmetic*, during which nearly 500MB of memory is also allocated.
The computation of $\sum_{n=1}^{10000000} n$ takes on my PC an average of 2 milliseconds with standard 64-bit integers and over one second in *arbitrary precision arithmetic*, during which nearly 500MB of memory is also allocated.
:::
:::
@@ -311,9 +311,8 @@ The computation of $\sum_{n=1}^{10000000} n$ takes on my PC an average of 2 nano
## Floating-Point Numbers
From _floating point numbers_, one can form German **[Gleit|Fließ]--[Komma|Punkt]--Zahlen**, and indeed all 4 variants appear in the literature.
In numerical mathematics, one also often speaks of **machine numbers**.
In numerical mathematics, the term **machine numbers** is also commonly used.
### Basic Idea
@@ -340,9 +339,9 @@ holds.
## Machine Numbers
The set of machine numbers $𝕄(b, p, e_{min}, e_{max})$ is characterized by the base $b$ used, the mantissa length $p$, and the value range of the exponent $\{e_{min}, ... ,e_{max}\}$.
The set of machine numbers $𝕄(b, p, e_{min}, e_{max})$ is characterized by the base $b$, the mantissa length $p$, and the value range of the exponent $\{e_{min}, ... ,e_{max}\}$.
In our convention, the mantissa of a normalized machine number has one digit (of base $b$) nonzero before the decimal point and $p-1$ digits after the decimal point.
In our convention, the mantissa of a normalized machine number has one digit (of base $b$) before the decimal point and $p-1$ digits after the decimal point.
If $b=2$, one needs only $p-1$ bits to store the mantissa of normalized floating-point numbers.
@@ -353,7 +352,7 @@ The IEEE 754 standard, implemented by most modern processors and programming lan
:::
### Structure of `Float64` according to [IEEE 754 standard](https://de.wikipedia.org/wiki/IEEE_754)
### Structure of `Float64` according to the [IEEE 754 standard](https://en.wikipedia.org/wiki/IEEE_754)
::: {.content-visible when-format="html"}
@@ -361,7 +360,7 @@ The IEEE 754 standard, implemented by most modern processors and programming lan
](../images/1024px-IEEE_754_Double_Floating_Point_Format.png)
:::
::: {.content-visible when-format="pdf"}
::: {.content-visible when-format="typst"}
![Structure of a `Float64` \mysmall{(Source: \href{https://commons.wikimedia.org/wiki/File:IEEE_754_Double_Floating_Point_Format.svg}{Codekaizen}, \href{https://creativecommons.org/licenses/by-sa/4.0}{CC BY-SA 4.0}, via Wikimedia Commons)}
](../images/1024px-IEEE_754_Double_Floating_Point_Format.png){width="70%"}
:::
@@ -372,7 +371,7 @@ The IEEE 754 standard, implemented by most modern processors and programming lan
- The values $E=0$ and $E=(11111111111)_2=2047$ are reserved for encoding special values such as
$\pm0, \pm\infty$, NaN _(Not a Number)_ and subnormal numbers.
- 52 bits for the (shortened) mantissa $M,\quad 0\le M<1$, corresponding to approximately 16 decimal digits
- Thus, the following number is represented:
- Thus, the number represented is:
$$ x=(-1)^S \cdot(1+M)\cdot 2^{E-1023}$$
An example:
@@ -380,7 +379,7 @@ An example:
x = 27.56640625
bitstring(x)
```
This can be done more nicely:
This can be displayed more clearly:
```{julia}
function printbitsf64(x::Float64)
@@ -410,9 +409,9 @@ $$
x = (1 + 1/2 + 1/8 + 1/16 + 1/32 + 1/256 + 1/4096) * 2^4
```
- The machine numbers 𝕄 form a finite, discrete subset of . There is a smallest and a largest machine number, and apart from these, all x∈𝕄 have a predecessor and successor in 𝕄.
- The set of machine numbers 𝕄 forms a finite, discrete subset of . There exists a smallest and a largest machine number; all other elements x∈𝕄 have both a predecessor and successor in 𝕄.
- What is the successor of x in 𝕄? To do this, we set the smallest mantissa bit from 0 to 1.
- Converting a string of zeros and ones into the corresponding machine number is possible e.g. as follows:
- Converting a string of zeros and ones into the corresponding machine number:
```{julia}
@@ -445,8 +444,8 @@ printbitsf64(z)
@show nextfloat(1.) - 1 2^-52 eps(Float64);
```
- Machine epsilon is a measure of the relative distance between machine numbers and quantifies the statement: "64-bit floating-point numbers have a precision of approximately 16 decimal digits."
- Machine epsilon is something completely different from the smallest positive floating-point number:
- Machine epsilon measures the relative distance between machine numbers and quantifies the statement: "64-bit floating-point numbers have a precision of approximately 16 decimal digits."
- Machine epsilon should not be confused with the smallest positive floating-point number:
```{julia}
floatmin(Float64)
@@ -456,10 +455,10 @@ floatmin(Float64)
$$
\epsilon' = \frac{\epsilon}{2}\approx 1.1\times 10^{-16}
$$
is the maximum relative error that can occur when rounding a real number to the nearest machine number.
- Since numbers in the interval $(1-\epsilon',1+\epsilon']$ are rounded to the machine number $1$, one can also define $\epsilon'$ as: *the largest number for which in machine arithmetic still holds: $1+\epsilon' = 1$.*
This is the maximum relative error that can occur when rounding a real number to the nearest machine number.
- Since numbers in the interval $(1-\epsilon',1+\epsilon']$ are rounded to the machine number $1$, one can also define $\epsilon'$ as: *the largest number for which $1+\epsilon' = 1$ still holds in machine arithmetic.*
In this way, one can also compute machine epsilon:
This allows to compute machine epsilon using the floating point arithmetic:
:::{.ansitight}
@@ -491,7 +490,7 @@ Eps
- In the interval $[1,2)$ there are $2^{52}$ equidistant machine numbers.
- After that, the exponent increases by 1 and the mantissa $M$ is reset to 0. Thus, the interval $[2,4)$ again contains $2^{52}$ equidistant machine numbers, as does the interval $[4,8)$ up to $[2^{1023}, 2^{1024})$.
- Likewise, in the intervals $\ [\frac{1}{2},1), \ [\frac{1}{4},\frac{1}{2}),...$ there are $2^{52}$ equidistant machine numbers each, down to $[2^{-1022}, 2^{-1021})$.
- This forms the set $𝕄_+$ of positive machine numbers, and it is
- This forms the set $𝕄_+$ of positive machine numbers, and we have
$$
𝕄 = -𝕄_+ \cup \{0\} \cup 𝕄_+
$$
@@ -512,10 +511,10 @@ printbitsf64(floatmin(Float64))
## Rounding to Machine Numbers
- The mapping rd: $\rightarrow$ 𝕄 should round to the nearest representable number.
- Standard rounding mode: _round to nearest, ties to even_
If one lands exactly in the middle between two machine numbers *(tie)*, one chooses the one whose last mantissa bit is 0.
- Justification: this way, in 50% of the cases one rounds up and in 50% down, thus avoiding a "statistical drift" in longer calculations.
- The map rd: $\rightarrow$ 𝕄 should round to the nearest representable number.
- Standard rounding mode is _round to nearest, ties to even_:
when a value falls exactly midway between two machine numbers *(tie)*, the one with 0 as its last mantissa bit is selected.
- Justification: this way, we round up in 50% of the cases and down in 50% of the cases, thus avoiding a "statistical drift" in longer calculations.
- It holds:
$$
\frac{|x-\text{rd}(x)|}{|x|} \le \frac{1}{2} \epsilon
@@ -524,7 +523,7 @@ $$
## Machine Number Arithmetic
The machine numbers, as a subset of , are not algebraically closed. Even the sum of two machine numbers will generally not be a machine number.
The machine numbers, as a subset of , are not algebraically closed. Even the sum of two machine numbers is generally not representable as a machine number.
:::{.callout-important}
The IEEE 754 standard requires that machine number arithmetic produces the *rounded exact result*:
@@ -534,7 +533,7 @@ $$
a \oplus b = \text{rd}(a + b)
$$
The same must hold for the implementation of standard functions such as
`sqrt()`, `log()`, `sin()` ...: they also return the machine number closest to the exact result.
`sqrt()`, `log()`, `sin()`, ... -- they also return the machine number closest to the exact result.
:::
@@ -570,7 +569,7 @@ $$
$$
One should also be reminded that even "simple" decimal fractions often cannot be represented exactly as machine numbers:
One should also remember that even "simple" decimal fractions cannot always be represented exactly as machine numbers:
$$
\begin{aligned}
@@ -598,7 +597,7 @@ Consequence:
0.2 + 0.1
```
When outputting a machine number, the binary fraction must be converted to a decimal fraction. One can also display more digits of this decimal fraction expansion:
When outputting a machine number, the binary fraction must be converted to a decimal fraction. Julia can display more digits of this decimal fraction expansion:
```{julia}
using Printf
@printf("%.30f", 0.1)
@@ -607,11 +606,11 @@ using Printf
```{julia}
@printf("%.30f", 0.3)
```
The binary fraction mantissa of a machine number can have a long or even infinitely periodic decimal expansion. Therefore,
one should not be misled into thinking there is "higher precision"!
The binary fraction mantissa of a machine number can have a long or even infinitely periodic decimal expansion. But
one should not be misled into thinking that this is "higher precision"!
:::{.callout-important}
Moral: when testing `Float`s for equality, one should almost always define a realistic accuracy `epsilon` appropriate to the problem and
Key message: When testing `Float`s for equality, one should almost always define a realistic tolerance `epsilon` appropriate to the problem and
test:
```julia
@@ -628,15 +627,15 @@ end
The gap between zero and the smallest normalized machine number $2^{-1022} \approx 2.22\times 10^{-308}$
is filled with subnormal machine numbers.
For understanding, let's take a simple model:
Let's look at a simple model:
- Let 𝕄(10,4,±5) be the set of machine numbers to base 10 with 4 mantissa digits (one before the decimal point, 3 after) and the exponent range -5 ≤ E ≤ 5.
- Then the normalized representation (nonzero leading digit)
of e.g. 1234.0 is 1.234e3 and of 0.00789 is 7.890e-3.
- It is important that machine numbers are kept normalized at every computation step. Only then is the mantissa length fully utilized and the accuracy is maximum.
- It is essential that machine numbers remain normalized at each computation step. Only then is the full mantissa length utilized, maximizing accuracy.
- The smallest positive normalized number in our model is `x = 1.000e-5`. Already `x/2` would have to be rounded to 0.
- Here, for many applications, it is advantageous to allow also subnormal *(subnormal)* numbers and represent `x/2` as `0.500e-5` or `x/20` as `0.050e-5`.
- This *gradual underflow* is当然 associated with a loss of valid digits and thus accuracy.
- But for many applications, it is advantageous to allow also subnormal numbers and represent `x/2` as `0.500e-5` or `x/20` as `0.050e-5`.
- This *gradual underflow* is of course associated with a loss of significant digits and thus accuracy.
In the `Float` data type, such *subnormal values* are represented by an exponent field in which all bits are equal to zero:
@@ -663,7 +662,7 @@ end
## Special Values
Floating-point arithmetic knows some special values, e.g.
Floating-point arithmetic defines certain special values, e.g.,
```{julia}
nextfloat(floatmax(Float64))
```
@@ -675,16 +674,16 @@ for x ∈ (NaN, Inf, -Inf, -0.0)
end
```
- An exponent overflow *(overflow)* leads to the result `Inf` or `-Inf`.
- An exponent overflow leads to the result `Inf` or `-Inf`.
```{julia}
2/0, -3/0, floatmax(Float64) * 1.01, exp(1300)
```
- One can continue calculating with it:
- One can continue calculating with these values:
```{julia}
-Inf + 20, Inf/30, 23/-Inf, sqrt(Inf), Inf * 0, Inf - Inf
```
- `NaN` *(Not a Number)* stands for the result of an operation that is undefined. All further operations with `NaN` also result in `NaN`.
- `NaN` *(Not a Number)* represents the result of an undefined operation. All further operations with `NaN` also result in `NaN`.
```{julia}
0/0, Inf - Inf, 2.3NaN, sqrt(NaN)
@@ -698,7 +697,7 @@ y = Inf - Inf
@show x==y NaN==NaN isfinite(NaN) isinf(NaN) isnan(x) isnan(y);
```
- There is a "minus zero". It signals an exponent underflow *(underflow)* of a magnitude that has become too small *negative* quantity.
- There is a "minus zero". It signals a numerical underflow of a small *negative* quantity.
```{julia}
@show 23/-Inf -2/exp(1200) -0.0==0.0;
@@ -709,7 +708,7 @@ y = Inf - Inf
Julia has the [usual mathematical functions](https://docs.julialang.org/en/v1/manual/mathematical-operations/#Rounding-functions)
`sqrt, exp, log, log2, log10, sin, cos,..., asin, acos,..., sinh,..., gcd, lcm, factorial,...,abs, max, min,...`,
including e.g. the [rounding functions](https://de.wikipedia.org/wiki/Abrundungsfunktion_und_Aufrundungsfunktion)
including e.g. the [rounding functions](https://en.wikipedia.org/wiki/Floor_and_ceiling_functions)
- `floor(T,x)` = $\lfloor x \rfloor$
- `ceil(T,x)` = $\lceil x \rceil$
@@ -722,7 +721,7 @@ floor(3.4), floor(Int64, 3.5), floor(Int64, -3.5)
ceil(3.4), ceil(Int64, 3.5), ceil(Int64, -3.5)
```
Also worth noting is `atan(y, x)`, the [two-argument arctangent](https://de.wikipedia.org/wiki/Arctan2). In other programming languages, it is often implemented as a function with its own name *atan2*.
Also worth noting is `atan(y, x)`, the two-argument arctangent (known as `atan2` in many programming languages, see [atan2](https://en.wikipedia.org/wiki/Atan2)).
This solves the problem of converting from Cartesian to polar coordinates without awkward case distinctions.
- `atan(y,x)` is the angle of the polar coordinates of (x,y) in the interval $(-\pi,\pi]$. In the 1st and 4th quadrants, it is therefore equal to `atan(y/x)`
@@ -732,9 +731,9 @@ atan(3, -2), atan(-3, 2), atan(-3/2)
```
## Conversion Strings $\Longleftrightarrow$ Numbers
## Conversion Between Strings and Numbers
Conversion is possible with the functions `parse()` and `string()`.
Use the functions `parse()` and `string()` for such conversions:
```{julia}
parse(Int64, "1101", base=2)