Files
JuliaKurs23/chapters/13_IO.qmd

492 lines
11 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
engine: julia
---
```{julia}
#| error: false
#| echo: false
#| output: false
using InteractiveUtils
import QuartoNotebookWorker
Base.stdout = QuartoNotebookWorker.with_context(stdout)
```
```{julia}
#| error: false
#| echo: false
#| output: false
flush(stdout)
```
# Input and Output
```{julia}
#| error: false
#| echo: false
#| output: false
# https://github.com/JuliaLang/julia/blob/master/base/show.jl#L516-L520
# https://github.com/JuliaLang/julia/blob/master/base/show.jl#L3073-L3077
using InteractiveUtils
import QuartoNotebookWorker
Base.stdout = QuartoNotebookWorker.with_context(stdout)
myactive_module() = Main.Notebook
Base.active_module() = myactive_module()
```
## Console
The operating system normally provides 3 channels (_streams_) for a program:
- Standard input channel `stdin`
- Standard output channel `stdout` and
- Standard error output channel `stderr`.
When the program is started in a terminal (or console or shell), the program can read keyboard input via `stdin` and output appears in the terminal via `stdout` and `stdout`.
- Writing to `stdout`: `print()`,`println()`,`printstyled()`
- Writing to `stderr`: `print(strerr,...)`, `println(stderr,...)`, `printstyled(stderr,...)`
- Reading from `stdin`: `readline()`
### Input
The language _Python_ provides a function `input()`:
```{.python}
ans = input("Please enter a positive number!")
```
The function prints the prompt, waits for input, and returns the
input as a `string`.
In Julia, you can implement this function as follows:
```{julia}
function input(prompt = "Input:")
println(prompt)
flush(stdout)
return chomp(readline())
end
```
**Comments**
- Write instructions are buffered by modern operating systems. With `flush(stdout)`, the buffer is emptied and the write operation is forced to complete immediately.
- `readline()` returns a string ending with a newline `\n`. The function `chomp()` removes a possible line break from the end of a string.
```{julia}
#| eval: false
a = input("Please enter 2 numbers!")
```
```{julia}
#| echo: false
a = "34 56"
```
### Processing the Input
> `split(str)` splits a string into "words" and returns an _(array of strings)_:
```{julia}
av = split(a)
```
> `parse(T, str)` tries to convert `str` to type `T`:
```{julia}
v = parse.(Int, av)
```
`parse()` generates an error if the string cannot be parsed as a value of type `T`. You can catch the error with
`try/catch` or use the function `tryparse(T, str)`, which returns `nothing` in such a case - on which you can then
e.g. test with `isnothing()`.
### Reading Individual Keystrokes
- `readline()` and similar functions wait for the input to be completed by pressing the `Enter` key.
- Techniques for reading individual _keystrokes_ can be found here:
- [https://stackoverflow.com/questions/56888266/how-to-read-keyboard-inputs-at-every-keystroke-in-julia](https://stackoverflow.com/questions/56888266/how-to-read-keyboard-inputs-at-every-keystroke-in-julia)
- [https://stackoverflow.com/questions/60954235/how-can-i-test-whether-stdin-has-input-available-in-julia](https://stackoverflow.com/questions/60954235/how-can-i-test-whether-stdin-has-input-available-in-julia)
## Formatted Output with the `Printf` Macro
Often you want to output numbers or strings with a strict format specification - total length, decimal places, right/left-aligned, etc.
To this end, the `Printf` package defines the macros `@sprintf` and `@printf`, which work very similarly to the corresponding C functions.
```{julia}
using Printf
x = 123.7876355638734
@printf("Output right-aligned with max. 10 character width and 3 decimal places: x= %10.3f", x)
```
The first argument is a string containing placeholders (here: `%10.3`) for variables to be output; followed by these variables as further arguments.
Placeholders have the form
```
%[flags][width][.precision]type
```
where the entries in square brackets are all optional.
**Type specifications in placeholders**
| | |
|:--|:------------|
|`%s`| `string`|
|`%i`| `integer`|
|`%o`| `integer octal (base=8)`|
|`%x, %X`| `integer hexadecimal (base=16) with digits 0-9abcdef or 0-9ABCDEF, resp.`|
|`%f`| `floating point number`|
|`%e`| `floating point number, scientific representation`|
|`%g`| `floating point, uses %f or %e depending on value`|
: {.striped .hover}
**Flags**
| | |
|:----|:-----|
|Plus sign| right-aligned (default)|
|Minus sign| left-aligned|
|Zero| with leading zeros|
: {.striped .hover}
**Width**
```
Number of minimum characters used (more will be taken if necessary)
```
### Examples:
```{julia}
using Printf # Don't forget to load the package!
```
```{julia}
@printf("|%s|", "Hello") # string with placeholder for string
```
The vertical bars are not part of the placeholder. They are intended to indicate the boundaries of the output field.
```{julia}
@printf("|%10s|", "Hello") # Minimum length, right-aligned
```
```{julia}
@printf("|%-10s|", "Hello") # left-aligned
```
```{julia}
@printf("|%3s|", "Hello") # Length specification can be exceeded
# Better a 'badly formatted' table than incorrect values!
```
```{julia}
j = 123
k = 90019001
l = 3342678
@printf("j= %012i, k= %-12i, l = %12i", j, k, l) # 0-flag for leading zeros
```
`@printf` and `@sprintf` can be called like functions or as macros:
```{julia}
@printf("%i %i", 22, j)
```
-- or as macros, i.e., without function parentheses and without comma:
```{julia}
@printf "%i %i" 22 j
```
`@printf` can take a stream as its first argument.
Otherwise, the argument list consists of
- format string with placeholders
- variables in the order of the placeholders, matching in number and type to the placeholders
```{julia}
@printf(stderr, "First result: %i %s\nSecond result %i",
j, "(estimated)" ,k)
```
The macro `@sprintf` does not print anything but returns the filled formatted string:
```{julia}
str = @sprintf("x = %10.6f", π );
```
```{julia}
str
```
### Formatting Floating Point Numbers:
Meaning of the _precision_ value:
- `%f` and `%e` format: maximum number of decimal places
- `%g` format: maximum number of digits output (integer + decimal places)
```{julia}
x = 123456.7890123456
@printf("%20.4f %20.4e", x, x) # 4 decimal places
```
```{julia}
@printf("%20.7f %20.7e", x, x) # 7 decimal places
```
```{julia}
@printf("%20.7g %20.4g", x, x) # total 7 and 4 digits respectively
```
## File Operations
Files are
- opened $\Longrightarrow$ a new _stream_-object is created (in addition to `stdin, stdout, stderr`)
- then this _stream_ can be read from and written to
- closed $\Longrightarrow$ _stream_-object is detached from file
```{.julia}
stream = open(path, mode)
```
- path: filename/path
- mode:
```
"r" read, opens at file beginning
"w" write, opens at file beginning (file is created or overwritten)
"a" append, opens to continue writing at file end
```
Let's write a file:
```{julia}
file = open("datei.txt", "w")
```
```{julia}
@printf(file, "%10i\n", k)
```
```{julia}
println(file, " second line")
```
```{julia}
close(file)
```
Let's look at the file:
```{julia}
;cat datei.txt
```
...and now we open it again for reading:
```{julia}
stream = open("datei.txt", "r")
```
`readlines(stream)` returns all lines of a text file as a vector of strings.
`eachline(stream)` returns an iterator over the lines of the file.
```{julia}
n = 0
for line in eachline(stream) # Read line by line
n += 1
println(n, line) # Print with line number
end
close(stream)
```
## Packages for File Formats
For input and output in various file formats, there are Julia packages, e.g.,
- [PrettyTables.jl](https://ronisbr.github.io/PrettyTables.jl/stable/) Output of formatted tables
- [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/) Input and output of matrices, etc.
- [CSV.jl](https://csv.juliadata.org/stable/) Input and output of "comma-separated values" files, etc.
- [XLSX.jl](https://felipenoris.github.io/XLSX.jl/stable/tutorial/) Input and output of Excel files
and many more...
### DelimitedFiles.jl
This package enables convenient saving/reading of matrices. It provides the functions `writedlm()` and `readdlm()`.
```{julia}
using DelimitedFiles
```
We generate a 200×3 matrix of random numbers
```{julia}
A = rand(200,3)
```
and save it
```{julia}
f = open("data2.txt", "w")
writedlm(f, A)
close(f)
```
The written file starts like this:
```{julia}
;head data2.txt
```
Reading it back is even simpler:
```{julia}
B = readdlm("data2.txt")
```
One more point: In Julia, the `do` notation is often used for file handling, see @sec-do.
This uses the fact that `open()` also has methods where the 1st argument is a `function(iostream)`.
This function is then applied to the _stream_ and the stream is automatically closed at the end. The `do` notation allows you to
define this function anonymously after the `do`:
```{julia}
open("data2.txt", "w") do io
writedlm(io, A)
end
```
### CSV and DataFrames
- The CSV format is often used to provide tables in a form that can be read not only by MS Excel.
- An example is the weather and climate database _Meteostat_.
- The [DataFrames.jl](https://dataframes.juliadata.org/stable/) package provides functions for convenient handling of tabular data.
```{julia}
using CSV, DataFrames, Downloads
# Weather data from Westerland, see https://dev.meteostat.net/bulk/hourly.html
url = "https://bulk.meteostat.net/v2/hourly/10018.csv.gz"
http_response = Downloads.download(url)
file = CSV.File(http_response, header=false);
```
The data looks like this:
```{julia}
# https://dev.meteostat.net/bulk/hourly.html#endpoints
#
# Column 1 Date
# 2 Time (hour)
# 3 Temperature
# 5 Humidity
# 6 Precipitation
# 8 Wind direction
# 9 Wind speed
df = DataFrame(file)
```
```{julia}
#| error: false
#| echo: false
#| output: false
#| eval: false
describe(df)
```
For convenient plotting and handling of date and time formats in the weather table,
we load two helper packages:
```{julia}
using StatsPlots, Dates
```
We create a new column that combines date (from column 1) and time (from column 2):
```{julia}
# new column combining col. 1 and 2 (date & time)
df[!, :datetime] = DateTime.(df.Column1) .+ Hour.(df.Column2);
```
```{julia}
#| error: false
#| echo: false
#| output: false
#| eval: false
@df df plot(:datetime, :Column3)
```
And now to the plot:
```{julia}
@df df plot(:datetime, [:Column9, :Column6, :Column3],
xlims = (DateTime(2023,9,1), DateTime(2024,5,30)),
layout=(3,1), title=["Wind" "Rain" "Temp"],
legend=:none, size=(800,800))
```