--- engine: julia --- # Containers Julia offers a wide selection of container types with largely similar interfaces. We introduce `Tuple`, `Range`, and `Dict` here, and in the next chapter we will cover `Array`, `Vector`, and `Matrix`. These containers are: - **iterable:** You can iterate over the elements of the container: ```julia for x ∈ container ... end ``` - **indexable:** You can access elements via their index: ```julia x = container[i] ``` and some are also - **mutable:** You can add, remove, and modify elements. Furthermore, there are several common functions, e.g., - `length(container)` --- number of elements - `eltype(container)` --- type of elements - `isempty(container)` --- test whether container is empty - `empty!(container)` --- empties container (only if mutable) ## Tuples A tuple is an immutable container of elements. It is therefore not possible to add new elements or change the value of an element. ```{julia} t = (33, 4.5, "Hello") @show t[2] # indexable for i ∈ t println(i) end # iterable ``` A tuple is an **inhomogeneous** type. Each element has its own type, which is reflected in the tuple's type: ```{julia} typeof(t) ``` Tuples are frequently used as function return values to return more than one object. ```{julia} # Integer division and remainder: # quotient and remainder are assigned to variables q and r q, r = divrem(71, 6) @show q r; ``` As you can see here, parentheses can be omitted in certain constructs. This *implicit tuple packing/unpacking* is also commonly used in multiple assignments: ```{julia} x, y, z = 12, 17, 203 ``` ```{julia} y ``` Some functions require tuples as arguments or always return tuples. Then you sometimes need a tuple with a single element. This is written as: ```{julia} x = (13,) # a 1-element tuple ``` The comma - not the parentheses - makes the tuple. ```{julia} x= (13) # not a tuple ``` ## Ranges We have already used *range* objects in numerical `for` loops. ```{julia} r = 1:1000 typeof(r) ``` There are various *range* types. As you can see, they are parameterized types based on the numeric type, and `UnitRange` is, for example, a *range* with step size 1. Their constructors are usually named `range()`. The colon is a special syntax. - `a:b` is parsed as `range(a, b)` - `a:b:c` is parsed as `range(a, c, step=b)` *Ranges* are obviously iterable, not mutable, but indexable. ```{julia} (3:100)[20] # the 20th element ``` Recall the semantics of the `for` loop: `for i in 1:1000` means **not**: - 'The loop variable `i` is incremented by one in each iteration' **but rather** - 'The loop variable is successively assigned the values 1,2,3,...,1000 from the container'. However, it would be very inefficient to actually create this container explicitly. - _Ranges_ are "lazy" vectors that are never really stored as a concrete list anywhere. This makes them so useful as iterators in `for` loops: memory-efficient and fast. - They are "recipes" or generators that respond to the query "Give me your next element!". - In fact, the supertype `AbstractRange` is a subtype of `AbstractVector`. The macro `@allocated` outputs how many bytes of memory were allocated during the evaluation of an expression. ```{julia} @allocated r = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] ``` ```{julia} @allocated r = 1:20 ``` The function `collect()` is used to convert to a "real" vector. ```{julia} collect(20:-3:1) ``` Quite useful, e.g., when preparing data for plotting, is the *range* type `LinRange`. ```{julia} LinRange(2, 50, 300) ``` `LinRange(start, stop, n)` generates an equidistant list of `n` values where the first and last are the specified limits. With `collect()` you can also obtain the corresponding vector if needed. ## Dictionaries - _Dictionaries_ (German: "associative list" or "lookup table" or ...) are special containers. - Entries in a vector `v` are addressable by an index 1,2,3....: `v[i]` - Entries in a _dictionary_ are addressable by more general _keys_. - A _dictionary_ is a collection of _key-value_ pairs. - Thus, _dictionaries_ in Julia have the parameterized type `Dict{S,T}`, where `S` is the type of _keys_ and `T` is the type of _values_. They can be created explicitly: ```{julia} # Population in 2020 in millions, source: wikipedia EW = Dict("Berlin" => 3.66, "Hamburg" => 1.85, "München" => 1.49, "Köln" => 1.08) ``` ```{julia} typeof(EW) ``` and indexed with the _keys_: ```{julia} EW["Berlin"] ``` Of course, querying a non-existent _key_ is an error. ```{julia} EW["Leipzig"] ``` You can also ask beforehand... ```{julia} haskey(EW, "Leipzig") ``` ... or use the function `get(dict, key, default)`, which does not throw an error for non-existent keys but returns the third argument. ```{julia} @show get(EW, "Leipzig", -1) get(EW, "Berlin", -1); ``` You can also request all `keys` and `values` as special containers. ```{julia} keys(EW) ``` ```{julia} values(EW) ``` You can iterate over the `keys`... ```{julia} for i in keys(EW) n = EW[i] println("The city $i has $n million inhabitants.") end ``` or directly over `key-value` pairs. ```{julia} for (stadt, ew) ∈ EW println("$stadt : $ew Million.") end ``` ### Extending and Modifying You can add additional `key-value` pairs to a `Dict`... ```{julia} EW["Leipzig"] = 0.52 EW["Dresden"] = 0.52 EW ``` and change a `value`. ```{julia} # Oh, the Leipzig number was from 2010, not 2020 EW["Leipzig"] = 0.597 EW ``` A pair can also be deleted via its `key`. ```{julia} delete!(EW, "Dresden") ``` Many functions can work with `Dicts` like with other containers. ```{julia} maximum(values(EW)) ``` ### Creating an Empty Dictionary Without type specification ... ```{julia} d1 = Dict() ``` and with type specification: ```{julia} d2 = Dict{String, Int}() ``` ### Conversion to Vectors: `collect()` - `keys(dict)` and `values(dict)` are special data types. - The function `collect()` converts them to a `Vector` type. - `collect(dict)` returns a list of type `Vector{Pair{S,T}}` ```{julia} collect(EW) ``` ```{julia} collect(keys(EW)), collect(values(EW)) ``` ### Ordered Iteration over a Dictionary We sort the keys. As strings, they are sorted alphabetically. With the `rev` parameter, sorting is done in reverse order. ```{julia} for k in sort(collect(keys(EW)), rev = true) n = EW[k] println("$k has $n million inhabitants ") end ``` We sort `collect(dict)`. This is a vector of pairs. With `by` we define what to sort by: the second element of the pair. ```{julia} for (k,v) in sort(collect(EW), by = pair -> last(pair), rev=false) println("$k has $v million inhabitants") end ``` ### An Application of Dictionaries: Counting Frequencies We do "experimental probability" with 2 dice: Given `l`, a list with the results of 100,000 double dice rolls, i.e., 100,000 numbers between 2 and 12. How frequently do the numbers 2 to 12 occur? We (let) roll: ```{julia} l = rand(1:6, 100_000) .+ rand(1:6, 100_000) ``` We count the frequencies of the events using a dictionary. We take the event as the `key` and its frequency as the `value`. ```{julia} # In this case, one could also solve this with a simple vector. # A better illustration would be, e.g., word frequency in # a text. Then i is not an integer but a word=string d = Dict{Int,Int}() # the dict for counting for i in l # for each i, d[i] is incremented. d[i] = get(d, i, 0) + 1 end d ``` The result: ```{julia} using Plots plot(collect(keys(d)), collect(values(d)), seriestype=:scatter) ``` ##### The explanatory image: [https://math.stackexchange.com/questions/1204396/why-is-the-sum-of-the-rolls-of-two-dices-a-binomial-distribution-what-is-define](https://math.stackexchange.com/questions/1204396/why-is-the-sum-of-the-rolls-of-two-dices-a-binomial-distribution-what-is-define)