% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.train.R
\name{xgb.train}
\alias{xgb.train}
\title{Fit XGBoost Model}
\usage{
xgb.train(
  params = xgb.params(),
  data,
  nrounds,
  evals = list(),
  objective = NULL,
  custom_metric = NULL,
  verbose = 1,
  print_every_n = 1L,
  early_stopping_rounds = NULL,
  maximize = NULL,
  save_period = NULL,
  save_name = "xgboost.model",
  xgb_model = NULL,
  callbacks = list(),
  ...
)
}
\arguments{
\item{params}{List of XGBoost parameters which control the model building process.
See the \href{https://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}
and the documentation for \code{\link[=xgb.params]{xgb.params()}} for details.

Should be passed as list with named entries. Parameters that are not specified in this
list will use their default values.

A list of named parameters can be created through the function \code{\link[=xgb.params]{xgb.params()}}, which
accepts all valid parameters as function arguments.}

\item{data}{Training dataset. \code{xgb.train()} accepts only an \code{xgb.DMatrix} as the input.

Note that there is a function \code{\link[=xgboost]{xgboost()}} which is meant to accept R data objects
as inputs, such as data frames and matrices.}

\item{nrounds}{Max number of boosting iterations.}

\item{evals}{Named list of \code{xgb.DMatrix} datasets to use for evaluating model performance.
Metrics specified in either \code{eval_metric} (under params) or \code{custom_metric} (function
argument here) will be computed for each of these datasets during each boosting iteration,
and stored in the end as a field named \code{evaluation_log} in the resulting object.

When either \code{verbose>=1} or \code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback is engaged, the performance
results are continuously printed out during the training.

E.g., specifying \code{evals=list(validation1=mat1, validation2=mat2)} allows to track
the performance of each round's model on \code{mat1} and \code{mat2}.}

\item{objective}{Customized objective function. Should take two arguments: the first one will be the
current predictions (either a numeric vector or matrix depending on the number of targets / classes),
and the second one will be the \code{data} DMatrix object that is used for training.

It should return a list with two elements \code{grad} and \code{hess} (in that order), as either
numeric vectors or numeric matrices depending on the number of targets / classes (same
dimension as the predictions that are passed as first argument).}

\item{custom_metric}{Customized evaluation function. Just like \code{objective}, should take two arguments,
with the first one being the predictions and the second one the \code{data} DMatrix.

Should return a list with two elements \code{metric} (name that will be displayed for this metric,
should be a string / character), and \code{value} (the number that the function calculates, should
be a numeric scalar).

Note that even if passing \code{custom_metric}, objectives also have an associated default metric that
will be evaluated in addition to it. In order to disable the built-in metric, one can pass
parameter \code{disable_default_eval_metric = TRUE}.}

\item{verbose}{If 0, xgboost will stay silent. If 1, it will print information about performance.
If 2, some additional information will be printed out.
Note that setting \code{verbose > 0} automatically engages the
\code{xgb.cb.print.evaluation(period=1)} callback function.}

\item{print_every_n}{When passing \code{verbose>0}, evaluation logs (metrics calculated on the
data passed under \code{evals}) will be printed every nth iteration according to the value passed
here. The first and last iteration are always included regardless of this 'n'.

Only has an effect when passing data under \code{evals} and when passing \code{verbose>0}. The parameter
is passed to the \code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback.}

\item{early_stopping_rounds}{Number of boosting rounds after which training will be stopped
if there is no improvement in performance (as measured by the evaluatiation metric that is
supplied or selected by default for the objective) on the evaluation data passed under
\code{evals}.

Must pass \code{evals} in order to use this functionality. Setting this parameter adds the
\code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.

If \code{NULL}, early stopping will not be used.}

\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set, then this parameter must be set as well.
When it is \code{TRUE}, it means the larger the evaluation score the better.
This parameter is passed to the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}

\item{save_period}{When not \code{NULL}, model is saved to disk after every \code{save_period} rounds.
0 means save at the end. The saving is handled by the \code{\link[=xgb.cb.save.model]{xgb.cb.save.model()}} callback.}

\item{save_name}{the name or path for periodically saved model file.}

\item{xgb_model}{A previously built model to continue the training from.
Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a
file with a previously saved model.}

\item{callbacks}{A list of callback functions to perform various task during boosting.
See \code{\link[=xgb.Callback]{xgb.Callback()}}. Some of the callbacks are automatically created depending on the
parameters' values. User can provide either existing or their own callback methods in order
to customize the training process.

Note that some callbacks might try to leave attributes in the resulting model object,
such as an evaluation log (a \code{data.table} object) - be aware that these objects are kept
as R attributes, and thus do not get saved when using XGBoost's own serializaters like
\code{\link[=xgb.save]{xgb.save()}} (but are kept when using R serializers like \code{\link[=saveRDS]{saveRDS()}}).}

\item{...}{Not used.

Some arguments that were part of this function in previous XGBoost versions are currently
deprecated or have been renamed. If a deprecated or renamed argument is passed, will throw
a warning (by default) and use its current equivalent instead. This warning will become an
error if using the \link[=xgboost-options]{'strict mode' option}.

If some additional argument is passed that is neither a current function argument nor
a deprecated or renamed argument, a warning or error will be thrown depending on the
'strict mode' option.

\bold{Important:} \code{...} will be removed in a future version, and all the current
deprecation warnings will become errors. Please use only arguments that form part of
the function signature.}
}
\value{
An object of class \code{xgb.Booster}.
}
\description{
Fits an XGBoost model to given data in DMatrix format (e.g. as produced by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}}).
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/model.html}{Introduction to Boosted Trees}
for a longer explanation of what XGBoost does, and the rest of the
\href{https://xgboost.readthedocs.io/en/latest/tutorials/index.html}{XGBoost Tutorials} for further
explanations XGBoost's features and usage.

Compared to function \code{\link[=xgboost]{xgboost()}} which is a user-friendly function targeted towards interactive
usage, \code{xgb.train} is a lower-level interface which allows finer-grained control and exposes
further functionalities offered by the core library (such as learning-to-rank objectives), but
which works exclusively with XGBoost's own data format ("DMatrices") instead of with regular R
objects.

The syntax of this function closely mimics the same function from the Python package for XGBoost,
and is recommended to use for package developers over \code{xgboost()} as it will provide a more
stable interface (with fewer breaking changes) and lower overhead from data validations.

See also the \href{https://xgboost.readthedocs.io/en/latest/R-package/migration_guide.html}{migration guide}
if coming from a previous version of XGBoost in the 1.x series.
}
\details{
Compared to \code{\link[=xgboost]{xgboost()}}, the \code{xgb.train()} interface supports advanced features such as
\code{evals}, customized objective and evaluation metric functions, among others, with the
difference these work \code{xgb.DMatrix} objects and do not follow typical R idioms.

Parallelization is automatically enabled if OpenMP is present.
Number of threads can also be manually specified via the \code{nthread} parameter.

While in XGBoost language bindings, the default random seed defaults to zero, in R, if a parameter \code{seed}
is not manually supplied, it will generate a random seed through R's own random number generator,
whose seed in turn is controllable through \code{set.seed}. If \code{seed} is passed, it will override the
RNG from R.

The following callbacks are automatically created when certain parameters are set:
\itemize{
\item \code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} is turned on when \code{verbose > 0} and the \code{print_every_n}
parameter is passed to it.
\item \code{\link[=xgb.cb.evaluation.log]{xgb.cb.evaluation.log()}} is on when \code{evals} is present.
\item \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}}: When \code{early_stopping_rounds} is set.
\item \code{\link[=xgb.cb.save.model]{xgb.cb.save.model()}}: When \code{save_period > 0} is set.
}

Note that objects of type \code{xgb.Booster} as returned by this function behave a bit differently
from typical R objects (it's an 'altrep' list class), and it makes a separation between
internal booster attributes (restricted to jsonifyable data), accessed through \code{\link[=xgb.attr]{xgb.attr()}}
and shared between interfaces through serialization functions like \code{\link[=xgb.save]{xgb.save()}}; and
R-specific attributes (typically the result from a callback), accessed through \code{\link[=attributes]{attributes()}}
and \code{\link[=attr]{attr()}}, which are otherwise
only used in the R interface, only kept when using R's serializers like \code{\link[=saveRDS]{saveRDS()}}, and
not anyhow used by functions like \code{predict.xgb.Booster()}.

Be aware that one such R attribute that is automatically added is \code{params} - this attribute
is assigned from the \code{params} argument to this function, and is only meant to serve as a
reference for what went into the booster, but is not used in other methods that take a booster
object - so for example, changing the booster's configuration requires calling \verb{xgb.config<-}
or \verb{xgb.model.parameters<-}, while simply modifying \verb{attributes(model)$params$<...>} will have no
effect elsewhere.
}
\examples{
data(agaricus.train, package = "xgboost")
data(agaricus.test, package = "xgboost")

## Keep the number of threads to 1 for examples
nthread <- 1
data.table::setDTthreads(nthread)

dtrain <- with(
  agaricus.train, xgb.DMatrix(data, label = label, nthread = nthread)
)
dtest <- with(
  agaricus.test, xgb.DMatrix(data, label = label, nthread = nthread)
)
evals <- list(train = dtrain, eval = dtest)

## A simple xgb.train example:
param <- xgb.params(
  max_depth = 2,
  nthread = nthread,
  objective = "binary:logistic",
  eval_metric = "auc"
)
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)

## An xgb.train example where custom objective and evaluation metric are
## used:
logregobj <- function(preds, dtrain) {
   labels <- getinfo(dtrain, "label")
   preds <- 1/(1 + exp(-preds))
   grad <- preds - labels
   hess <- preds * (1 - preds)
   return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
  return(list(metric = "error", value = err))
}

# These functions could be used by passing them as 'objective' and
# 'eval_metric' parameters in the params list:
param <- xgb.params(
  max_depth = 2,
  nthread = nthread,
  objective = logregobj,
  eval_metric = evalerror
)
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)

# ... or as dedicated 'objective' and 'custom_metric' parameters of xgb.train:
bst <- xgb.train(
  within(param, rm("objective", "eval_metric")),
  dtrain, nrounds = 2, evals = evals,
  objective = logregobj, custom_metric = evalerror
)


## An xgb.train example of using variable learning rates at each iteration:
param <- xgb.params(
  max_depth = 2,
  learning_rate = 1,
  nthread = nthread,
  objective = "binary:logistic",
  eval_metric = "auc"
)
my_learning_rates <- list(learning_rate = c(0.5, 0.1))

bst <- xgb.train(
 param,
 dtrain,
 nrounds = 2,
 evals = evals,
 verbose = 0,
 callbacks = list(xgb.cb.reset.parameters(my_learning_rates))
)

## Early stopping:
bst <- xgb.train(
  param, dtrain, nrounds = 25, evals = evals, early_stopping_rounds = 3
)
}
\references{
Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System",
22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, \url{https://arxiv.org/abs/1603.02754}
}
\seealso{
\code{\link[=xgb.Callback]{xgb.Callback()}}, \code{\link[=predict.xgb.Booster]{predict.xgb.Booster()}}, \code{\link[=xgb.cv]{xgb.cv()}}
}
