Get the number of unique records per group, where the groups are defined using the interaction of all columns listed in xvar. Typically, less than three entries leads to the exposure of confidential data. This function allows one to check to see the number of unique labels (e.g., vessel names) within various categories (e.g., combinations of month and year) is less than three. Data can then be subset based on the new column called ngroups.

get_confidential(data, xvar, yvar = "VESSEL")

Arguments

data

A data frame.

xvar

A vector of column names available in data.

yvar

A character value representing the column name in data that is the dependent variable. There can only be one entry in this argument, not a vector, and it must match an existing column exactly.

Value

A new column is added to the input data frame called ngroups that provides the number of unique yvar entries for each grouping.

Details

If you want to use two variables for yvar, then you will need to pre-process your data. The interaction function will be helpful, in that it can take any number of columns and create a new column that is combination of everything that you want. This option is better than using paste to combine columns. You will not always want the interaction though, and sometimes you might want to use ifelse calls to combine columns. For example, with hake, catcher boats deliver to motherships and each type of vessel has its own name in the data frame. If the catcher boat column entry is NA then you would assume that a name would be present in the mothership column because it might be a catcher-processor vessel that doesn't need a catcher boat. So, if is.na(data[, "catcherboat"]) then data[, "mothership"].

Author

Kelli F. Johnson