In order to use memory efficiently in computing the components for optimization, the logistic action reads your data table in multiple batches, or pages; each page contains at most a certain number of observations. During the optimization, the action reads the first page of observations from the data table into memory, creates the design rows, performs the appropriate log-likelihood, gradient, and Hessian computations on that page of observations, and then discards those observations and reads in the next page of data for processing.
Generally, smaller pages use less memory but can lead to longer computation times, whereas larger pages can run faster but use more memory. In particular, with sufficient memory, the optimization is typically fastest if you can fit all your data on a single page, because then you do not have to repeatedly reaccess the data table and recompute the design rows.
The default maxOptBatch parameter determines that you have enough memory to use one page for the optimization if the number of observations in your data table for each thread on each machine node is less than
where
| b | = | the approximate number of bytes available to a machine node, or 1GB |
| t | = | the number of threads available to a machine node |
| p | = | the number of parameters in your model |
| r | = | 2n + 10 for continuous response models |
| r | = | 2n + y + 9 for categorical response models |
| n | = | the number of observations in the data table |
| y | = | the number of response levels for categorical response models |
If the logistic action determines that you do not have enough memory, then the maxOptBatch parameter is set to 256 to optimize the matrix computations, and a note is displayed that gives the value of the maxOptBatch parameter that is required if you want to store all the data for a machine node on a single page.