english improved

2026-03-05 20:09:16 +01:00
parent c6609d15f5
commit 733fe8c290
21 changed files with 954 additions and 1042 deletions
--- a/chapters/numerictypes.qmd
+++ b/chapters/numerictypes.qmd
@@ -31,24 +31,24 @@ for x ∈ ( 3, 3.3e4,  Int16(20), Float32(3.3e4), UInt16(9) )
 end
 ```

-## Integer Numbers *(integers)*
+## Integers

-Integer numbers are fundamentally stored as bit patterns of fixed length. Therefore, the value range is finite.
+Integers are stored as fixed-length bit patterns. Therefore, the value range is finite.
 **Within this value range** addition, subtraction, multiplication, and integer division with remainder
 are exact operations without rounding errors.

-Integer numbers come in two types: `Signed` (with sign) and `Unsigned`, which can be viewed as machine models for ℤ and ℕ respectively.
+Integer numbers come in two types: `Signed`  and `Unsigned`, which can be viewed as machine models for ℤ and ℕ respectively.


 ### *Unsigned integers*
 ```{julia}
 subtypes(Unsigned)
 ```
-UInts are binary numbers with n=8, 16, 32, 64, or 128 bits length and the corresponding value range of
+`UInts` are binary numbers with a bit width of  8, 16, 32, 64, or 128  and the corresponding value range of
 $$
 0 \le x < 2^n 
 $$
-They are used relatively rarely in *scientific computing*. In hardware-proximate programming, they are e.g. used for handling binary data and memory addresses. Therefore, Julia displays them by default as hexadecimal numbers (with prefix `0x` and digits `0-9a-f`).
+They are used  rarely in *scientific computing*. In low-level hardware programming, they are used, e.g., for handling binary data and memory addresses. By default, Julia displays them  as hexadecimal numbers (with prefix `0x` and digits `0-9a-f`).

 ```{julia}
 x = 0x0033efef
@@ -74,7 +74,7 @@ In Julia, integer numbers are 64-bit by default:
 x = 42
 typeof(x)
 ```
-Therefore, they have the value range: 
+Therefore, they have the value range:
 $$
 -9.223.372.036.854.775.808 \le x \le 9.223.372.036.854.775.807
 $$
@@ -83,7 +83,7 @@ $$
 $$
 -2.147.483.648 \le x \le 2.147.483.647
 $$
-The maximum value $2^{31}-1$ is conveniently a Mersenne prime:
+By the way, the maximum value $2^{31}-1$ is  a Mersenne prime:

 ```{julia}
 using Primes 
@@ -94,13 +94,13 @@ Negative numbers are represented in two's complement:

 $x \Rightarrow -x$ corresponds to: _flip all bits, then add 1_

-This looks like this:
+This looks as follows:

 ::: {.content-visible when-format="html"}
 ![A representation of the fictional data type `Int4`](../images/Int4.png){width=50%}
 :::

-::: {.content-visible when-format="pdf"}
+::: {.content-visible when-format="typst"}
 ![A representation of the fictional data type `Int4`](../images/Int4.png){width=50%}
 :::

@@ -115,7 +115,7 @@ x = 2^62 - 10 + 2^62
 ```{julia}
 x + 20
 ```
-No error message, no warning! Fixed-length integers do not lie on a line, but on a circle!
+No error message, no warning! Fixed-length integers do not lie on a line, but on a **circle.**

 :::

@@ -155,12 +155,12 @@ The operations `+`,`-`,`*` have the usual exact arithmetic **modulo $2^{64}$**.
 #### Powers `a^b`

 - Powers `a^n` are computed exactly modulo $2^{64}$ for natural exponents `n`.
- For negative exponents, the result is a floating-point number.
+- For negative exponents, the result is a `Float`.
 - `0^0` is [naturally](https://en.wikipedia.org/wiki/Zero_to_the_power_of_zero#cite_note-T4n3B-4) equal to 1.
 ```{julia}
 (-2)^63, 2^64, 3^(-3), 0^0 
 ```
- For natural exponents, [*exponentiation by squaring*](https://de.wikipedia.org/wiki/Bin%C3%A4re_Exponentiation) is used, so for example `x^23` requires only 7 multiplications:
+- For natural exponents, [*exponentiation by squaring*](https://en.wikipedia.org/wiki/Exponentiation_by_squaring) is used, so for example `x^23` requires only 7 multiplications:
 $$
 x^{23} = \left( \left( (x^2)^2 \cdot x \right)^2  \cdot x \right)^2  \cdot x 
 $$
@@ -176,7 +176,7 @@ x = 40/5

 - The functions `div(a,b)`,  `rem(a,b)`, and `divrem(a,b)` compute the quotient of integer division, the corresponding remainder, or both as a tuple.
 - For `div(a,b)` there is the operator form `a ÷ b` (input: `\div<TAB>`), and for `rem(a,b)` the operator form `a % b`.
- By default, division is "rounded toward zero", so the corresponding remainder has the same sign as the dividend `a`:
+- By default, division uses "rounding toward zero", so the corresponding remainder has the same sign as the dividend `a`:

 ```{julia}
@show divrem( 27, 4)
@@ -185,9 +185,9 @@ x = 40/5
@show ( 27 ÷ -4,  27 % -4);
 ```

- A rounding rule other than `RoundToZero` can be specified as the third optional argument for the functions.
+- A rounding rule other than `RoundToZero` can be specified as the third optional argument for these functions.
 - `?RoundingMode` shows the possible rounding modes.
- For the rounding rule `RoundDown` ("toward minus infinity"), so that the corresponding remainder has the same sign as the divisor `b`, there are also the functions `fld(a,b)` *(floored division)* and `mod(a,b)`:
+- For the rounding rule `RoundDown` ("toward minus infinity" -- so that the corresponding remainder has the same sign as the divisor `b`), there are also the functions `fld(a,b)` *(floored division)* and `mod(a,b)`:

 ```{julia}
@show divrem(-27, 4, RoundDown)
@@ -195,14 +195,14 @@ x = 40/5
@show (fld( 27, -4), mod( 27, -4));
 ``` 

-For all rounding modes holds:
+For all rounding modes, the following holds:
 ```
 div(a, b, RoundingMode) * b + rem(a, b, RoundingMode) = a
 ```

 #### The `BigInt` Type

-The `BigInt` type allows arbitrary-length integers. The required memory is dynamically allocated.
+The `BigInt` type supports arbitrary-precision integers with dynamically allocated memory.

 Numeric constants automatically have a sufficiently large type:

@@ -217,7 +217,7 @@ z = 10_000_000_000_000_000_000_000_000_000_000_000_000_000   # 10 sextillion
@show typeof(z);
 ```

-Usually, one must explicitly request the `BigInt` type to avoid modulo $2^{64}$ arithmetic:
+In most cases, you must explicitly specify the `BigInt` type to avoid modulo $2^{64}$ arithmetic:

 ```{julia}
@show 3^300        BigInt(3)^300;
@@ -225,7 +225,7 @@ Usually, one must explicitly request the `BigInt` type to avoid modulo $2^{64}$

 *Arbitrary precision arithmetic* comes at a cost of significant memory and computation time.

-We compare the time and memory requirements for summing 10 million integers as `Int64` and as `BigInt`.
+We compare the time and memory requirements for summing 10 million integers as `Int64` versus `BigInt`.

 ```{julia}
 # 10^7 random numbers, uniformly distributed between -10^7 and 10^7 
@@ -235,7 +235,7 @@ vec_int = rand(-10^7:10^7, 10^7)
 vec_bigint = BigInt.(vec_int)
 ```

-An initial impression of the time and memory requirements is provided by the `@time` macro:
+The `@time` macro provides a rough estimate of the required time and memory:

 ```{julia}
@time x = sum(vec_int)
@@ -246,7 +246,7 @@ An initial impression of the time and memory requirements is provided by the `@t
@show x typeof(x);
 ```

-Due to Julia's just-in-time compilation, a single execution of a function is not very informative. The `BenchmarkTools` package provides the `@benchmark` macro, which calls a function multiple times and displays the execution times as a histogram.
+Due to Julia's just-in-time compilation, timing a single function call is not very informative. The `BenchmarkTools` package provides the `@benchmark` macro, which calls a function multiple times and displays the execution times as a histogram.

 :::{.ansitight}
 ```{julia}
@@ -263,8 +263,8 @@ using BenchmarkTools
 The `BigInt` addition is more than 30 times slower.

 :::{.content-hidden unless-format="xxx"}
-The following function should compute the sum of all numbers from 1 to n using arithmetic of type T.
-Due to the *type promotion rules*, it is sufficient for `T ≥ Int64` to initialize the accumulator variable with a number of type T.
+The following function computes the sum of all numbers from 1 to n using arithmetic of type T.
+Due to *type promotion rules*, it is sufficient  to initialize the accumulator  with a value of type T (for `T ≥ Int64`).
 ```{julia}
 function mysum(n, T)
    s = T(0)
@@ -303,7 +303,7 @@ using BenchmarkTools
 ```


-The computation of $\sum_{n=1}^{10000000} n$ takes on my PC an average of 2 nanoseconds with standard 64-bit integers and over one second in *arbitrary precision arithmetic*, during which nearly 500MB of memory is also allocated.
+The computation of $\sum_{n=1}^{10000000} n$ takes on my PC an average of 2 milliseconds with standard 64-bit integers and over one second in *arbitrary precision arithmetic*, during which nearly 500MB of memory is also allocated.

 :::
 :::
@@ -311,9 +311,8 @@ The computation of $\sum_{n=1}^{10000000} n$ takes on my PC an average of 2 nano

 ## Floating-Point Numbers

-From _floating point numbers_, one can form German **[Gleit|Fließ]--[Komma|Punkt]--Zahlen**, and indeed all 4 variants appear in the literature.

-In numerical mathematics, one also often speaks of **machine numbers**.
+In numerical mathematics, the term  **machine numbers** is also commonly used.


 ### Basic Idea
@@ -340,9 +339,9 @@ holds.
   
 ## Machine Numbers

-The set of machine numbers  $𝕄(b, p, e_{min}, e_{max})$ is characterized by the base $b$ used, the mantissa length $p$, and the value range of the exponent $\{e_{min}, ... ,e_{max}\}$.   
+The set of machine numbers  $𝕄(b, p, e_{min}, e_{max})$ is characterized by the base $b$, the mantissa length $p$, and the value range of the exponent $\{e_{min}, ... ,e_{max}\}$.   

-In our convention, the mantissa of a normalized machine number has one digit (of base $b$) nonzero before the decimal point and $p-1$ digits after the decimal point.
+In our convention, the mantissa of a normalized machine number has one digit (of base $b$)  before the decimal point and $p-1$ digits after the decimal point.

 If $b=2$, one needs only $p-1$ bits to store the mantissa of normalized floating-point numbers.

@@ -353,7 +352,7 @@ The IEEE 754 standard, implemented by most modern processors and programming lan

 :::

-### Structure of `Float64` according to [IEEE 754 standard](https://de.wikipedia.org/wiki/IEEE_754)
+### Structure of `Float64` according to the [IEEE 754 standard](https://en.wikipedia.org/wiki/IEEE_754)


 ::: {.content-visible when-format="html"}
@@ -361,7 +360,7 @@ The IEEE 754 standard, implemented by most modern processors and programming lan
 ](../images/1024px-IEEE_754_Double_Floating_Point_Format.png)
 :::

-::: {.content-visible when-format="pdf"}
+::: {.content-visible when-format="typst"}
 ![Structure of a `Float64`  \mysmall{(Source: \href{https://commons.wikimedia.org/wiki/File:IEEE_754_Double_Floating_Point_Format.svg}{Codekaizen}, \href{https://creativecommons.org/licenses/by-sa/4.0}{CC BY-SA 4.0}, via Wikimedia Commons)}
 ](../images/1024px-IEEE_754_Double_Floating_Point_Format.png){width="70%"}
 :::
@@ -372,7 +371,7 @@ The IEEE 754 standard, implemented by most modern processors and programming lan
 - The values $E=0$ and $E=(11111111111)_2=2047$ are reserved for encoding special values such as
 $\pm0, \pm\infty$, NaN _(Not a Number)_ and subnormal numbers.
 - 52 bits for the (shortened) mantissa $M,\quad 0\le M<1$, corresponding to approximately 16 decimal digits
- Thus, the following number is represented:
+- Thus, the  number represented is:
 $$ x=(-1)^S \cdot(1+M)\cdot 2^{E-1023}$$

 An example:
@@ -380,7 +379,7 @@ An example:
 x = 27.56640625
 bitstring(x)
 ```
-This can be done more nicely:
+This can be displayed more clearly:

 ```{julia}
 function printbitsf64(x::Float64)
@@ -410,9 +409,9 @@ $$
 x = (1 + 1/2 + 1/8 + 1/16 + 1/32 + 1/256 + 1/4096) * 2^4
 ```

- The machine numbers 𝕄 form a finite, discrete subset of ℝ. There is a smallest and a largest machine number, and apart from these, all x∈𝕄 have a predecessor and successor in 𝕄.
+- The set of machine numbers 𝕄 forms a finite, discrete subset of ℝ. There exists a smallest and a largest machine number; all other elements x∈𝕄 have both a predecessor and successor in 𝕄.
 - What is the successor of x in 𝕄? To do this, we set the smallest mantissa bit from 0 to 1.
- Converting a string of zeros and ones into the corresponding machine number is possible e.g. as follows:
+- Converting a string of zeros and ones into the corresponding machine number:


 ```{julia}
@@ -445,8 +444,8 @@ printbitsf64(z)
@show nextfloat(1.) - 1   2^-52   eps(Float64);
 ```

- Machine epsilon is a measure of the relative distance between machine numbers and quantifies the statement: "64-bit floating-point numbers have a precision of approximately 16 decimal digits."
- Machine epsilon is something completely different from the smallest positive floating-point number:
+- Machine epsilon measures the relative distance between machine numbers and quantifies the statement: "64-bit floating-point numbers have a precision of approximately 16 decimal digits."
+- Machine epsilon should not be confused with the smallest positive floating-point number:

 ```{julia}
 floatmin(Float64)
@@ -456,10 +455,10 @@ floatmin(Float64)
 $$
 \epsilon' = \frac{\epsilon}{2}\approx 1.1\times 10^{-16}
 $$
-is the maximum relative error that can occur when rounding a real number to the nearest machine number.
- Since numbers in the interval $(1-\epsilon',1+\epsilon']$ are rounded to the machine number $1$, one can also define $\epsilon'$ as: *the largest number for which in machine arithmetic still holds: $1+\epsilon' = 1$.*
+This is the maximum relative error that can occur when rounding a real number to the nearest machine number.
+- Since numbers in the interval $(1-\epsilon',1+\epsilon']$ are rounded to the machine number $1$, one can also define $\epsilon'$ as: *the largest number for which $1+\epsilon' = 1$ still holds in machine arithmetic.*

-In this way, one can also compute machine epsilon:
+This allows to compute machine epsilon using the floating point arithmetic:

 :::{.ansitight}

@@ -491,7 +490,7 @@ Eps
 - In the interval $[1,2)$ there are $2^{52}$ equidistant machine numbers.
 - After that, the exponent increases by 1 and the mantissa $M$ is reset to 0. Thus, the interval $[2,4)$ again contains $2^{52}$ equidistant machine numbers, as does the interval $[4,8)$ up to $[2^{1023}, 2^{1024})$.
 - Likewise, in the intervals $\ [\frac{1}{2},1), \ [\frac{1}{4},\frac{1}{2}),...$  there are $2^{52}$ equidistant machine numbers each, down to $[2^{-1022}, 2^{-1021})$.
- This forms the set $𝕄_+$ of positive machine numbers, and it is
+- This forms the set $𝕄_+$ of positive machine numbers, and we have
 $$
 𝕄  =  -𝕄_+  \cup \{0\} \cup 𝕄_+
 $$
@@ -512,10 +511,10 @@ printbitsf64(floatmin(Float64))

 ## Rounding to Machine Numbers

- The mapping rd: ℝ $\rightarrow$  𝕄 should round to the nearest representable number.
- Standard rounding mode: _round to nearest, ties to even_
-  If one lands exactly in the middle between two machine numbers *(tie)*, one chooses the one whose last mantissa bit is 0.
- Justification: this way, in 50% of the cases one rounds up and in 50% down, thus avoiding a "statistical drift" in longer calculations.
+- The map rd: ℝ $\rightarrow$  𝕄 should round to the nearest representable number.
+- Standard rounding mode is  _round to nearest, ties to even_: 
+  when a value falls exactly midway between two machine numbers *(tie)*, the one with 0 as its last mantissa bit is selected.
+- Justification: this way, we round up in 50% of the cases and down in 50% of the cases, thus avoiding a "statistical drift" in longer calculations.
 - It holds:
 $$
 \frac{|x-\text{rd}(x)|}{|x|} \le \frac{1}{2} \epsilon
@@ -524,7 +523,7 @@ $$

 ## Machine Number Arithmetic

-The machine numbers, as a subset of ℝ, are not algebraically closed. Even the sum of two machine numbers will generally not be a machine number.
+The machine numbers, as a subset of ℝ, are not algebraically closed. Even the sum of two machine numbers is  generally not representable as a machine number.

 :::{.callout-important}
 The IEEE 754 standard requires that machine number arithmetic produces the *rounded exact result*:
@@ -534,7 +533,7 @@ $$
 a \oplus b = \text{rd}(a + b)
 $$ 
 The same must hold for the implementation of standard functions such as
-`sqrt()`, `log()`, `sin()` ...: they also return the machine number closest to the exact result.
+`sqrt()`, `log()`, `sin()`, ... -- they also return the machine number closest to the exact result.
 :::


@@ -570,7 +569,7 @@ $$
 $$


-One should also be reminded that even "simple" decimal fractions often cannot be represented exactly as machine numbers:
+One should also remember that even "simple" decimal fractions  cannot always be represented exactly as machine numbers:

 $$
 \begin{aligned}
@@ -598,7 +597,7 @@ Consequence:
 0.2 + 0.1
 ```

-When outputting a machine number, the binary fraction must be converted to a decimal fraction. One can also display more digits of this decimal fraction expansion:
+When outputting a machine number, the binary fraction must be converted to a decimal fraction. Julia can display more digits of this decimal fraction expansion:
 ```{julia}
 using Printf
@printf("%.30f", 0.1)
@@ -607,11 +606,11 @@ using Printf
 ```{julia}
@printf("%.30f", 0.3)
 ```
-The binary fraction mantissa of a machine number can have a long or even infinitely periodic decimal expansion. Therefore,
-one should not be misled into thinking there is "higher precision"!
+The binary fraction mantissa of a machine number can have a long or even infinitely periodic decimal expansion. But
+one should not be misled into thinking that this is "higher precision"!

 :::{.callout-important}
-Moral: when testing `Float`s for equality, one should almost always define a realistic accuracy `epsilon` appropriate to the problem and
+Key message: When testing `Float`s for equality, one should almost always define a realistic tolerance `epsilon` appropriate to the problem and
 test:

 ```julia
@@ -628,15 +627,15 @@ end
 The gap between zero and the smallest normalized machine number $2^{-1022} \approx 2.22\times 10^{-308}$ 
 is filled with subnormal machine numbers.

-For understanding, let's take a simple model:
+Let's look at a simple model:

 - Let 𝕄(10,4,±5) be the set of machine numbers to base 10 with 4 mantissa digits (one before the decimal point, 3 after) and the exponent range -5 ≤ E ≤ 5.
 - Then the normalized representation (nonzero leading digit)
 of e.g. 1234.0 is 1.234e3 and of 0.00789 is 7.890e-3.
- It is important that machine numbers are kept normalized at every computation step. Only then is the mantissa length fully utilized and the accuracy is maximum.
+- It is essential that machine numbers remain normalized at each computation step. Only then is the full mantissa length utilized, maximizing accuracy.
 - The smallest positive normalized number in our model is `x = 1.000e-5`. Already `x/2` would have to be rounded to 0.
- Here, for many applications, it is advantageous to allow also subnormal *(subnormal)* numbers and represent `x/2` as `0.500e-5` or `x/20` as `0.050e-5`.
- This *gradual underflow* is当然 associated with a loss of valid digits and thus accuracy.
+- But for many applications, it is advantageous to allow also subnormal numbers and represent `x/2` as `0.500e-5` or `x/20` as `0.050e-5`.
+- This *gradual underflow* is of course associated with a loss of significant digits and thus accuracy.

 In the `Float` data type, such *subnormal values* are represented by an exponent field in which all bits are equal to zero:

@@ -663,7 +662,7 @@ end

 ## Special Values

-Floating-point arithmetic knows some special values, e.g.
+Floating-point arithmetic defines certain special values, e.g.,
 ```{julia}
 nextfloat(floatmax(Float64))
 ```
@@ -675,16 +674,16 @@ for x ∈ (NaN, Inf, -Inf, -0.0)
 end
 ```

- An exponent overflow *(overflow)* leads to the result `Inf` or `-Inf`.
+- An exponent overflow leads to the result `Inf` or `-Inf`.
 ```{julia}
 2/0, -3/0, floatmax(Float64) * 1.01, exp(1300)
 ```
- One can continue calculating with it:
+- One can continue calculating with these values:

 ```{julia}
 -Inf + 20, Inf/30, 23/-Inf, sqrt(Inf),  Inf * 0, Inf - Inf
 ```
- `NaN` *(Not a Number)* stands for the result of an operation that is undefined. All further operations with `NaN` also result in `NaN`.
+- `NaN` *(Not a Number)* represents the result of an undefined operation. All further operations with `NaN` also result in `NaN`.

 ```{julia}
 0/0, Inf - Inf, 2.3NaN, sqrt(NaN)
@@ -698,7 +697,7 @@ y = Inf - Inf
@show x==y   NaN==NaN  isfinite(NaN) isinf(NaN) isnan(x) isnan(y);  
 ``` 

- There is a "minus zero". It signals an exponent underflow *(underflow)* of a magnitude that has become too small *negative* quantity.
+- There is a "minus zero". It signals a numerical underflow  of a small *negative* quantity.

 ```{julia}
@show 23/-Inf   -2/exp(1200)    -0.0==0.0;
@@ -709,7 +708,7 @@ y = Inf - Inf
 Julia has the [usual mathematical functions](https://docs.julialang.org/en/v1/manual/mathematical-operations/#Rounding-functions)  
 `sqrt, exp, log, log2, log10, sin, cos,..., asin, acos,..., sinh,..., gcd, lcm, factorial,...,abs, max, min,...`,

-including e.g. the [rounding functions](https://de.wikipedia.org/wiki/Abrundungsfunktion_und_Aufrundungsfunktion)
+including e.g. the [rounding functions](https://en.wikipedia.org/wiki/Floor_and_ceiling_functions)

 - `floor(T,x)` = $\lfloor x \rfloor$
 - `ceil(T,x)`  = $\lceil x \rceil$
@@ -722,7 +721,7 @@ floor(3.4),   floor(Int64, 3.5),   floor(Int64, -3.5)
 ceil(3.4),    ceil(Int64, 3.5),    ceil(Int64, -3.5)
 ```

-Also worth noting is `atan(y, x)`, the [two-argument arctangent](https://de.wikipedia.org/wiki/Arctan2). In other programming languages, it is often implemented as a function with its own name *atan2*.
+Also worth noting is `atan(y, x)`, the two-argument arctangent (known as `atan2` in many programming languages, see [atan2](https://en.wikipedia.org/wiki/Atan2)). 
 This solves the problem of converting from Cartesian to polar coordinates without awkward case distinctions.

 - `atan(y,x)` is the angle of the polar coordinates of (x,y) in the interval $(-\pi,\pi]$. In the 1st and 4th quadrants, it is therefore equal to `atan(y/x)`
@@ -732,9 +731,9 @@ atan(3, -2),    atan(-3, 2),    atan(-3/2)
 ```


-##  Conversion  Strings $\Longleftrightarrow$ Numbers
+##  Conversion  Between Strings and Numbers

-Conversion is possible with the functions `parse()` and `string()`.
+Use the functions `parse()` and `string()` for such conversions:

 ```{julia}
 parse(Int64, "1101", base=2)