Mutating join — coalesce_join • hakedataUSA

A join that adds information from matching columns from y to x. If the value in y is NA, then the value from x will be used. Thus, all new information can be used to overwrite the old information.

coalesce_join(
  x,
  y,
  by = NULL,
  suffix = c(".x", ".y"),
  join = dplyr::full_join,
  ...
)

Arguments

x, y

A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

by

A join specification created with join_by(), or a character vector of variables to join by.

If NULL, the default, *_join() will perform a natural join, using all variables in common across x and y. A message lists the variables so that you can check they're correct; suppress the message by supplying by explicitly.

To join on different variables between x and y, use a join_by() specification. For example, join_by(a == b) will match x$a to y$b.

To join by multiple variables, use a join_by() specification with multiple expressions. For example, join_by(a == b, c == d) will match x$a to y$b and x$c to y$d. If the column names are the same between x and y, you can shorten this by listing only the variable names, like join_by(a, c).

join_by() can also be used to perform inequality, rolling, and overlap joins. See the documentation at ?join_by for details on these types of joins.

For simple equality joins, you can alternatively specify a character vector of variable names to join by. For example, by = c("a", "b") joins x$a to y$a and x$b to y$b. If variable names differ between x and y, use a named character vector like by = c("x_a" = "y_a", "x_b" = "y_b").

To perform a cross-join, generating all combinations of x and y, see cross_join().

suffix

If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

join

The dplyr function, or function from any other package, that should be used to join x and y. The default is to perform a full join, i.e., dplyr::full_join() between the two data frames.

...

Any additional arguments you wish to supply to the function specified in join.

References

https://alistaire.rbind.io/blog/coalescing-joins/

Author

Edward Visel with some changes from Kelli F. Johnson.