-
Notifications
You must be signed in to change notification settings - Fork 0
/
01_R.qmd
329 lines (235 loc) · 11.3 KB
/
01_R.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
# Introduction to R language
R is a **dynamically typed interpreted** language. It is a highly capable enviornment for computing and graphics, for which R is often labeled a *glue* language. It has built-in functions for statistical computing.
Many great sources for R language with the focus on data science and statistical data processing are already written and free online. We will just name a few and limit the introduction at this site to bare minimum, which we use within this class.
**High quality R sources:**
- CRAN manuals like **Introduction to R** at <https://cran.r-project.org/doc/manuals/R-intro.pdf>
- [The R Book](https://www.cs.upc.edu/~robert/teaching/estadistica/TheRBook.pdf) is a very comprehensive source, although strongly oriented towards statistics
- freeCodeCamp's [R programming tutorial](https://www.youtube.com/watch?v=_V8eKsto3Ug&t=4052s&pp=ygUWciBwcm9ncmFtbWluZyBsYW5ndWFnZQ%3D%3D) is a very mild introduction to scripting language
- [Advanced R (2nd edition)](https://adv-r.hadley.nz/) from Hadley Wickham contains a lot of information about langugae fundamentals and\
For the newcomer to R ecosystem we highly encourage to print and use the various official *cheat sheets*, created by **Posit Inc.**.
#### Some basic commands {.unnumbered}
```{r, eval=FALSE}
version # start by checking version of your R, should be at least 4.0 and above
getwd() # get working directory
ls() # list object in current environment
print() # evaluate to console
rm() # removes object from
View() # show internals of and object
...
```
Then there are commands that work with file structure such as
```{r, eval=FALSE}
list.files()
dir.create()
dir.exists()
file.exists()
...
```
and you can also invoke any shell/cmd command plus attributes with `system2()` interface.
## R as scientific calculator
#### Arithmetic operations {.unnumbered}
```{r, eval=TRUE, collapse=TRUE}
1 + 2 # addition
1 - 2 # subtraction
1 / 2 # division
1 * 2 # multiplication
1 %/% 2 # integer division
1 %% 2 # modulo oprator
```
#### Special values {.unnumbered}
R is familiar with the concept of $\pm\infty$, hence `-Inf` and `Inf` values are at disposal. You will get them most probably as results from computation heading to $\frac{\pm1}{0}$ numerically. There are other special values like `NULL` (*null value*), `NA` (*not assigned*) and `NaN` (*not a number*). The concept of *not assigned* is one that is particularly important, since it has significant impact on the computed result ({@code-mean-rm}).
```{r}
x <- seq(1:10) # general sequence of numbers
x[c(5,6)] <- NA # change some elements to not assigned
print(x)
mean(x) # without removal
mean(x, na.rm = TRUE) # and with removal
```
#### Set operations {.unnumbered}
For manipulating sets, there are a couple of essential functions `union()`, `intersect()`, `setdiff()` and operator `%in%`.
```{r, collapse=TRUE}
set_A <- c("a", "a", "b", "c", "D")
set_B <- c("a", "b", "d")
union(set_A, set_B)
intersect(set_A, set_B)
set_A %in% set_B
```
#### Matrix operations {.unnumbered}
For the purpose of following examples let's use an arbitrary matrix $M$ and a vectors $U$ and $V$.
$$
\mathbf{A} = \left(\begin{matrix}
2x& - 3y& &= 3\\
& - 2y& + 4z &= 9\\
2x& + 13y& + 9z&= 10
\end{matrix}\right),\\
$$ {#eq-sys-linear}
$$
\mathbf{u} = \begin{pmatrix}
1\\
-3\\
8\\
\end{pmatrix},
\mathbf{v} = \begin{pmatrix}
1\\
-3\\
8\\
\end{pmatrix}
$$ {#eq-uv-vctrs}
Solving a system of linear equations {@eq-sys-linear} is a one-liner:
```{r}
A <- matrix(data = c(2, -3, 0, 0, -2, 4, 2, 13, 9), nrow = 3, byrow = TRUE)
B <- c(3, 9, 10)
solve(A, B)
```
## R as programming language
### Variables and name conventions
It is possible. We highly discourage using diacritical marks in naming, like the Czech translation of the term *"variable"* - `proměnná`. Most programmers use either `camelNotation` or `snake_notation` for naming purposes. Obviously the R is *case-sensitive* so `camelNotation` and `CamelNotation` are two different things. Variables do not contain spaces, quotes, arithmetical, logical nor relational operators neither they contain special characters like `=`, `-`, ``.
### Functions
You can define own functions using the `function()` construct. If you work in \*\*\*\*RStudio, just type `fun` and tabulate a snippet from the IDE help. The action produces {@code-function-snippet}.
```{r, eval=FALSE}
name <- function(variables) {
...
}
```
`name` is the name of the function we would like to create and `variables` are the arguments of that function. Space between the `{`and `}` is called a body of a function and contains all the computation which is invoked when the function is called.
Let's put Here an example of creating own function to calculate *weighted mean*
$$
\bar{x} = \dfrac{\sum\limits_{i=1}^{n} w_ix_i}{\sum\limits_{i=1}^{n}w_i},
$$ where $x_iw_i$ are the individual weighted measurements.
We define a simple function for that purpose and run an example.
```{r}
w_mean <- function(x, w = 1/length(x)) {
sum(x*w)/sum(w)
}
w_mean(1:10)
```
We can test if we get the same result as the *primitive* function from R using `all.equal()` statement.
```{r}
all.equal(w_mean(x = 1:5, w = c(0.25, 0.25, 1, 2, 3)),
weighted.mean(x = 1:5, w = c(0.25, 0.25, 1, 2, 3)))
```
Any argument without default value in the function definition has to be provided on function call. You can frequently see functions with the possibility to specify `...` a so-called *three dot construct* or *ellipsis*. The **ellipsis** allows for adding any number of arguments to a function call, after all the named ones.
### Data types
The basic types are **logical**, **integer**, **numeric**, **complex**, **character** and **raw**. There are some additional types which we will encounter like **Date**. Since R is dynamically typed, it is not necessary for the user to declare variables before using them. Also the type changes without notice based on the stored values, where the chain goes from the least complex to the most. The summary is in the following table
```{r, collapse=TRUE}
TRUE # logical, also T as short version
1L # integer
1.2 # numeric
1+3i # complex
"A" # character, also 'A'
```
### Data structures
#### Vectors {.unnumbered}
Atomic vectors are single-type linear structures. They can contain elements of any type, from **logical**, **integer**, **numeric**, **complex**, **character**.
```{r}
#| label: test-code-annotation
#| echo: fenced
V <- vector(mode = "numeric", length = 0) # empty numeric vector creation
V[1] <- "A"
```
#### Matrices and arrays {.unnumbered}
If the object has more than one *dimension*, it is treated as an array. A special type of **array** is a **matrix**. Both object types have accompanying functions like `colSums()`, `rowMeans()`.
```{r, collapse=TRUE}
M <- matrix(data = 0, nrow = 5, ncol = 2) # empty matrix creation
M[1, 1] <- 1 # add single value at origin
M[, 1] <- 1.5 # store 1.5 to the whole first column
M[c(1,3), 1:2] <- rnorm(2) # store random numbers to first two rows
colMeans(M)
rowSums(M)
```
It is possible to have matrices containing any data type, e.g.
$$
M = \left(\begin{matrix}
\mathrm{A} & \mathrm{B}\\
\mathrm{C} & \mathrm{D}
\end{matrix}\right),\qquad
N = \left(\begin{matrix}
1+i & 5-3i\\
10+2i & i
\end{matrix}\right)
$$
```{r}
```
#### Data frames {.unnumbered}
`data.frame` structure is the workhorse of elementary data processing. It is a possibly heterogenic table-like structure, allowing storage of multiple data types (even other structures) in different columns. A *column* in any data frame is called a **variable** and *row* represents a single **observation**. If the data suffice this single condition, we say they are in **tidy** format. Processing *tidy* data is a big topic withing the R community and curious reader is encouraged to follow the development in **tidyverse** package ecosystem.
```{r}
thaya <- data.frame(date = NA,
runoff = NA,
precipitation = NA) # new empty data.frame with variables 'date', 'runoff', 'precipitation' and 'temperature'
#thaya$runoff <- rnorm(100, 1, 2)
```
#### Lists {.unnumbered}
**List** is the most general basic data structure. It is possible to store vectors, matrices, data frames and also other lists within a list. List structure does not pose any limitations on the internal objects lengths.
```{r, eval=TRUE}
l <- list() # empty list creation
l["A"] <- 1
print(l)
l$A <- 2
print(l)
```
#### Other objects {.unnumbered}
Although R is intended as functional programming language, more than one object oriented paradigm is implemented in the language. As new R users we encounter first OOP system in functions like `summary` and `plot`, which represent so called **S3 generic functions**. We will further work with **S4** system when processing geospatial data using proxy libraries like `sf` and `terra`. The OOP is very complex and will not be further discussed within this text. For further study we recommend OOP sections in **Advanced R** by Hadley Wickham.
```{r}
```
### Conditions
A condition in code creates branching of computation. Placing a condition creates at least two options from which only one is to be satisfied. The condition is created either by `if()`/`ifelse()` or `switch()` construct. We can again call for a snippet from RStudio help resulting in
```{r, eval=FALSE}
if (condition) {
...
}
switch (object,
case = action
)
ifelse(test, TRUE, FALSE)
```
The condition can be branched to larger structures like
```{r}
temperature <- 30
if (temperature > 30) {
cat("The temperature is hot.")
} else if (temperature > 15) {
cat("The temperature is warm.")
}
```
### Repetition
Loops (cycles) provide use with the ability to execute single statement of a block of code in `{}` multiple times. There are three key words for loop construction. They differ in use cases.
#### `for` cycle
Probably the most common loop is used when you **know the number of iterations prior to calling**. The iteration is therefore explicitly finite.
```{r, eval=FALSE}
for (variable in vector) {
...
}
```
An example
```{r}
for(i in 1:4) cat(i, ". iteration", "\n", sep = "")
```
#### `while` cycle
**while** is used in when it is impossible to state how many times something should be repeated. The case is rather in the form *while some condition is or is not met, repeat what is inside the body*. It is also used in intentionally infinite loop e.g. operating systems.
#### `repeat` cycle
In the cases when we need the repetition at least once, we will evaluate the code inside *until* a condition is met.
```{r}
repeat({
x <- rnorm(1)
cat(x)
if(x > 0) break
})
```
#### `break` and `next`
There are two statements which controls the iteration flow. Anytime `break` is called, the rest of the body is skipped and the loop ends. Anytime `next` is called, the rest of the body is skipped and next iteration is started.
::: columns
::: {.column width="70%"}
I would like to have text here
Sentence becomes longer, it should automatically stay in their column
:::
::: {.column width="10%"}
<!-- empty column to create gap -->
:::
::: {.column width="20%"}
and here
More text
:::
:::
### Exercise
1. create a sequence of numbers calling
2. What type is `NA`, why would you say is it?