% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/synth_new_attr.R
\name{synthetic_new_attribute}
\alias{synthetic_new_attribute}
\title{Add a new attribute to a synthetic_micro dataset}
\usage{
synthetic_new_attribute(df, prob_name = "p", attr_name = "variable",
  attr_vector, attr_levels, conditional_vars = NULL, ht_list = NULL)
}
\arguments{
\item{df}{An R object of class "synthetic_micro".}

\item{prob_name}{A string specifying the column name of the \code{df} containing the
probabilities for each synthetic observation.}

\item{attr_name}{A string specifying the desired name of the new attribute to be added to the data.}

\item{attr_vector}{A named vector specifying the counts or percentages of the new attribute,
or variable, to be added. Names must include appropriate naming for expression matching.}

\item{attr_levels}{A character vector specifying the complete set of levels for the new 
attribute.}

\item{conditional_vars}{An character vector specifying the existing variables, if any, on which 
the new attribute (variable) is to be conditioned on. Variables must be specified in order. 
Defaults to \code{NULL} ie- an unconditional new attribute.}

\item{ht_list}{A \code{list} of equal length to \code{conditional_vars}. Each element \code{k} of
\code{ht_list} is a \code{data.frame} constructed as a hash-table with one-to-one correspondence  
between \code{ht_list[[k]]} and \code{conditional_vars[k]}. Of the key-value pair, the key is
the first column and the value is the second column. See details.}
}
\value{
A new synthetic_micro dataset with class "synthetic_micro".
}
\description{
Add a new attribute to a synthetic_micro dataset using conditional relationships
between the new attribute and existing attributes (eg. wage rate conditioned on age and education 
level).
}
\section{Details}{

New synthetic variables are introduced to the existing data via conditional probability. Similar 
to \code{\link{derive_synth_datasets}}, the goal with this function is to generate a joint 
probability distribution for an attribute vector; and, to create synthetic individuals from 
this distribution. Although no limit is placed on the number of variables on which to condition, 
in practice, data rarely exists which allows more than two or three conditioning variables. Other 
variables are assumed to be independent from the new attribute. (**see note below)

Conditioning is implemented via pattern matching by matching the names of the \code{attr_vector} 
to the existing levels of the data. This is facilitated by hash-tables (\code{ht_list}) to ensure
accurate pattern matching. In the hash-table's key-value pair, the key is the actual level for the 
variable being conditioned upon, while the value is the regex string found in 
\code{names(attr_vector)}.

Successive levels of conditioning may be supplied by providing a vector of \code{conditional_vars}
paired with a equal length list of hash tables (\code{ht_list}). A recursive approach is 
employed to conditionally partition \code{attr_vector}. In this sense, the *order* in which
the conditional variables are supplied matters.

** There are four different types of conditional/marginal probability models which may be considered
for a given new attribute:
 (1) Independence: it is assumed that each of the variables is independent of the others
 (2) Pairwise conditional independence: it is assumed that attributes are related to 
 only one other attribute and independent of all others.
 (3) Conditional independence: Attributes can be depedent on some subset of other attributes and 
 independent of the rest.
 (4) In the most general case, all attributes are jointly interrelated.
}

