hddm Package¶
model Module¶
- class hddm.models.HDDMBase(data, bias=False, include=(), wiener_params=None, p_outlier=0.0, **kwargs)¶
HDDM base class. Not intended to be used directly. Instead, use hddm.HDDM.
Methods
append_stats_to_nodes_db(*args, **kwargs) smart call of MCMC.stats() for the model approximate_map([fall_to_simplex]) Set model to its approximate MAP. create_knodes() create_model([max_retries]) Set group level distributions. create_nodes_db() find_starting_values() Find good starting values for the different parameters by gen_stats([fname, print_hidden]) print statistics of all variables get_average_model([quantiles]) get_group_nodes([stochastic]) get_group_traces() Returns a DataFrame containing traces of all stochastic get_non_observeds() get_observeds() get_stochastics() get_subj_nodes([stochastic]) get_traces() Returns a DataFrame containing traces of all stochastic iter_group_nodes(**kwargs) iter_non_observeds() iter_observeds() iter_stochastics() iter_subj_nodes(**kwargs) load_db(dbname[, verbose, db]) Load samples from a database created by an earlier model run (e.g. map([runs, warn_crit, method]) Find MAP and set optimized values to nodes. mcmc([assign_step_methods]) Returns pymc.MCMC object of model. optimize(method[, quantiles, n_runs, ...]) Optimize model using ML, chi^2 or G^2. plot_posterior_predictive(*args, **kwargs) plot_posterior_quantiles(*args, **kwargs) Plot posterior predictive quantiles. plot_posteriors([params, plot_subjs, save]) plot the nodes posteriors pre_sample() print_stats([fname, print_hidden]) print statistics of all variables sample(*args, **kwargs) Sample from posterior. save(fname) Save model to file. set_values(new_values) set values of nodes according to new_values - plot_posterior_quantiles(*args, **kwargs)¶
Plot posterior predictive quantiles.
Arguments : model : HDDM model
Optional : - value_range : numpy.ndarray
Range over which to evaluate the CDF.
- samples : int (default=10)
Number of posterior samples to use.
- alpha : float (default=.75)
Alpha (transparency) of posterior quantiles.
- hexbin : bool (default=False)
Whether to plot posterior quantile density using hexbin.
- data_plot_kwargs : dict (default=None)
Forwarded to data plotting function call.
- predictive_plot_kwargs : dict (default=None)
Forwareded to predictive plotting function call.
- columns : int (default=3)
How many columns to use for plotting the subjects.
- save : bool (default=False)
Whether to save the figure to a file.
- path : str (default=None)
Save figure into directory prefix
- class hddm.models.HDDM(*args, **kwargs)¶
Create hierarchical drift-diffusion model in which each subject has a set of parameters that are constrained by a group distribution.
Arguments : - data : pandas.DataFrame
Input data with a row for each trial. Must contain the following columns:
- ‘rt’: Reaction time of trial in seconds.
- ‘response’: Binary response (e.g. 0->error, 1->correct)
- ‘subj_idx’: A unique ID (int) of each subject.
- Other user-defined columns that can be used in depends_on keyword.
Optional : - informative : bool <default=True>
Whether to use informative priors (True) or vague priors (False). Information about the priors can be found in the methods section. If you run a classical DDM experiment you should use this. However, if you apply the DDM to a novel domain like saccade data where RTs are much lower, or RTs of rats, you should probably set this to False.
- is_group_model : bool
If True, this results in a hierarchical model with separate parameter distributions for each subject. The subject parameter distributions are themselves distributed according to a group parameter distribution.
- depends_on : dict
Specifies which parameter depends on data of a column in data. For each unique element in that column, a separate set of parameter distributions will be created and applied. Multiple columns can be specified in a sequential container (e.g. list)
Example: >>> hddm.HDDM(data, depends_on={'v': 'difficulty'})
Separate drift-rate parameters will be estimated for each difficulty. Requires ‘data’ to have a column difficulty.
- bias : bool
Whether to allow a bias to be estimated. This is normally used when the responses represent left/right and subjects could develop a bias towards responding right. This is normally never done, however, when the ‘response’ column codes correct/error.
- p_outlier : double (default=0)
The probability of outliers in the data. if p_outlier is passed in the ‘include’ argument, then it is estimated from the data and the value passed using the p_outlier argument is ignored.
- default_intervars : dict (default = {‘sz’: 0, ‘st’: 0, ‘sv’: 0})
Fix intertrial variabilities to a certain value. Note that this will only have effect for variables not estimated from the data.
- plot_var : bool
Plot group variability parameters when calling pymc.Matplot.plot() (i.e. variance of Normal distribution.)
- trace_subjs : bool
Save trace for subjs (needed for many statistics so probably a good idea.)
- wiener_params : dict
Parameters for wfpt evaluation and numerical integration.
Parameters: - err: Error bound for wfpt (default 1e-4)
- n_st: Maximum depth for numerical integration for st (default 2)
- n_sz: Maximum depth for numerical integration for Z (default 2)
- use_adaptive: Whether to use adaptive numerical integration (default True)
- simps_err: Error bound for Simpson integration (default 1e-3)
Example : >>> data, params = hddm.generate.gen_rand_data() # gen data >>> model = hddm.HDDM(data) # create object >>> mcmc.sample(5000, burn=20) # Sample from posterior
Methods
append_stats_to_nodes_db(*args, **kwargs) smart call of MCMC.stats() for the model approximate_map([fall_to_simplex]) Set model to its approximate MAP. create_knodes() create_model([max_retries]) Set group level distributions. create_nodes_db() find_starting_values() Find good starting values for the different parameters by gen_stats([fname, print_hidden]) print statistics of all variables get_average_model([quantiles]) get_group_nodes([stochastic]) get_group_traces() Returns a DataFrame containing traces of all stochastic get_non_observeds() get_observeds() get_stochastics() get_subj_nodes([stochastic]) get_traces() Returns a DataFrame containing traces of all stochastic iter_group_nodes(**kwargs) iter_non_observeds() iter_observeds() iter_stochastics() iter_subj_nodes(**kwargs) load_db(dbname[, verbose, db]) Load samples from a database created by an earlier model run (e.g. map([runs, warn_crit, method]) Find MAP and set optimized values to nodes. mcmc([assign_step_methods]) Returns pymc.MCMC object of model. optimize(method[, quantiles, n_runs, ...]) Optimize model using ML, chi^2 or G^2. plot_posterior_predictive(*args, **kwargs) plot_posterior_quantiles(*args, **kwargs) Plot posterior predictive quantiles. plot_posteriors([params, plot_subjs, save]) plot the nodes posteriors pre_sample([use_slice]) print_stats([fname, print_hidden]) print statistics of all variables sample(*args, **kwargs) Sample from posterior. save(fname) Save model to file. set_values(new_values) set values of nodes according to new_values
- class hddm.models.HDDMStimCoding(*args, **kwargs)¶
HDDM model that can be used when stimulus coding and estimation of bias (i.e. displacement of starting point z) is required.
In that case, the ‘resp’ column in your data should contain 0 and 1 for the chosen stimulus (or direction), not whether the response was correct or not as you would use in accuracy coding. You then have to provide another column (referred to as stim_col) which contains information about which the correct response was.
Arguments : - split_param : str (‘v’ or ‘z’)
There are two ways to model stimulus coding in the case where both stimuli have equal information (so that there can be no difference in drift): * ‘z’: Use z for stimulus A and 1-z for stimulus B * ‘v’: Use drift v for stimulus A and -v for stimulus B
- stim_col : str
Column name for extracting the stimuli to use for splitting.
Methods
append_stats_to_nodes_db(*args, **kwargs) smart call of MCMC.stats() for the model approximate_map([fall_to_simplex]) Set model to its approximate MAP. create_knodes() create_model([max_retries]) Set group level distributions. create_nodes_db() find_starting_values() Find good starting values for the different parameters by gen_stats([fname, print_hidden]) print statistics of all variables get_average_model([quantiles]) get_group_nodes([stochastic]) get_group_traces() Returns a DataFrame containing traces of all stochastic get_non_observeds() get_observeds() get_stochastics() get_subj_nodes([stochastic]) get_traces() Returns a DataFrame containing traces of all stochastic iter_group_nodes(**kwargs) iter_non_observeds() iter_observeds() iter_stochastics() iter_subj_nodes(**kwargs) load_db(dbname[, verbose, db]) Load samples from a database created by an earlier model run (e.g. map([runs, warn_crit, method]) Find MAP and set optimized values to nodes. mcmc([assign_step_methods]) Returns pymc.MCMC object of model. optimize(method[, quantiles, n_runs, ...]) Optimize model using ML, chi^2 or G^2. plot_posterior_predictive(*args, **kwargs) plot_posterior_quantiles(*args, **kwargs) Plot posterior predictive quantiles. plot_posteriors([params, plot_subjs, save]) plot the nodes posteriors pre_sample([use_slice]) print_stats([fname, print_hidden]) print statistics of all variables sample(*args, **kwargs) Sample from posterior. save(fname) Save model to file. set_values(new_values) set values of nodes according to new_values
- class hddm.models.HDDMRegressor(data, models, group_only_regressors=True, **kwargs)¶
HDDMRegressor allows estimation of the DDM where parameter values are linear models of a covariate (e.g. a brain measure like fMRI or different conditions).
Methods
append_stats_to_nodes_db(*args, **kwargs) smart call of MCMC.stats() for the model approximate_map([fall_to_simplex]) Set model to its approximate MAP. create_knodes() create_model([max_retries]) Set group level distributions. create_nodes_db() find_starting_values() Find good starting values for the different parameters by gen_stats([fname, print_hidden]) print statistics of all variables get_average_model([quantiles]) get_group_nodes([stochastic]) get_group_traces() Returns a DataFrame containing traces of all stochastic get_non_observeds() get_observeds() get_stochastics() get_subj_nodes([stochastic]) get_traces() Returns a DataFrame containing traces of all stochastic iter_group_nodes(**kwargs) iter_non_observeds() iter_observeds() iter_stochastics() iter_subj_nodes(**kwargs) load_db(dbname[, verbose, db]) Load samples from a database created by an earlier model run (e.g. map([runs, warn_crit, method]) Find MAP and set optimized values to nodes. mcmc([assign_step_methods]) Returns pymc.MCMC object of model. optimize(method[, quantiles, n_runs, ...]) Optimize model using ML, chi^2 or G^2. plot_posterior_predictive(*args, **kwargs) plot_posterior_quantiles(*args, **kwargs) Plot posterior predictive quantiles. plot_posteriors([params, plot_subjs, save]) plot the nodes posteriors pre_sample([use_slice]) print_stats([fname, print_hidden]) print statistics of all variables sample(*args, **kwargs) Sample from posterior. save(fname) Save model to file. set_values(new_values) set values of nodes according to new_values
generate Module¶
- hddm.generate.add_outliers(data, n_fast, n_slow, seed=None)¶
add outliers to data. outliers are distrbuted randomly across condition. Input: data - data n_fast/n_slow - numberprobability of fast/slow outliers
- hddm.generate.gen_rand_data(params=None, n_fast_outliers=0, n_slow_outliers=0, **kwargs)¶
Generate simulated RTs with random parameters.
Optional : - params : dict <default=generate randomly>
Either dictionary mapping param names to values.
Or dictionary mapping condition name to parameter dictionary (see example below).
If not supplied, takes random values.
- n_fast_outliers : int <default=0>
How many fast outliers to add (outlier_RT < ter)
- n_slow_outliers : int <default=0>
How many late outliers to add.
The rest of the arguments are forwarded to kabuki.generate.gen_rand_data
Returns : data array with RTs parameter values
Example : # Generate random data set >>> data, params = hddm.generate.gen_rand_data({‘v’:0, ‘a’:2, ‘t’:.3},
size=100, subjs=5)
# Generate 2 conditions >>> data, params = hddm.generate.gen_rand_data({‘cond1’: {‘v’:0, ‘a’:2, ‘t’:.3},
‘cond2’: {‘v’:1, ‘a’:2, ‘t’:.3}})
Notes : Wrapper function for kabuki.generate.gen_rand_data. See the help doc of that function for more options.
- hddm.generate.gen_rand_params(include=(), cond_dict=None, seed=None)¶
Returns a dict of DDM parameters with random values.
Optional : - include : tuple
Which optional parameters include. Can be any combination of:
- ‘z’ (bias, default=0.5)
- ‘sv’ (inter-trial drift variability)
- ‘sz’ (inter-trial bias variability)
- ‘st’ (inter-trial non-decision time variability)
Special arguments are: * ‘all’: include all of the above * ‘all_inter’: include all of the above except ‘z’
- cond_dict : dictionary
cond_dict is used when multiple conditions are desired. the dictionary has the form of {param1: [value_1, ... , value_n], param2: [value_1, ... , value_n]} and the function will output n sets of parameters. each set with values from the appropriate place in the dictionary for instance if cond_dict={‘v’: [0, 0.5, 1]} then 3 parameters set will be created. the first with v=0 the second with v=0.5 and the third with v=1.
- seed: float
random seed
Output: if conditions is None:
- params: dictionary
a dictionary holding the parameters values
- else:
- cond_params: a dictionary holding the parameters for each one of the conditions,
that has the form {‘c1’: params1, ‘c2’: params2, ...} it can be used directly as an argument in gen_rand_data.
- merged_params:
a dictionary of parameters that can be used to validate the optimization and learning algorithms.
- hddm.generate.gen_rts(size=1000, range_=(-6, 6), dt=0.001, intra_sv=1.0, structured=True, subj_idx=None, method='cdf', **params)¶
A private function used by gen_rand_data Returns a DataFrame of randomly simulated RTs from the DDM.
Arguments : - params : dict
Parameter names and values to use for simulation.
Optional : - size : int
Number of RTs to simulate.
- range_ : tuple
Minimum (negative) and maximum (positve) RTs.
- dt : float
Number of steps/sec.
- intra_sv : float
Intra-trial variability.
- structured : bool
Return a structured array with fields ‘RT’ and ‘response’.
- subj_idx : int
If set, append column ‘subj_idx’ with value subj_idx.
- method : str
- Which method to use to simulate the RTs:
- ‘cdf’: fast, uses the inverse of cumulative density function to sample, dt can be 1e-2.
- ‘drift’: slow, simulates each complete drift process, dt should be 1e-4.
- hddm.generate.gen_single_params_set(include=())¶
Returns a dict of DDM parameters with random values for a singel conditin the function is used by gen_rand_params.
Optional : - include : tuple
Which optional parameters include. Can be any combination of:
- ‘z’ (bias, default=0.5)
- ‘sv’ (inter-trial drift variability)
- ‘sz’ (inter-trial bias variability)
- ‘st’ (inter-trial non-decision time variability)
Special arguments are: * ‘all’: include all of the above * ‘all_inter’: include all of the above except ‘z’
- hddm.generate.pdf_with_params(rt, params)¶
Helper function that calls full_pdf and gets the parameters from the dict params.
- hddm.generate.rand(d0, d1, ..., dn)¶
Random values in a given shape.
Create an array of the given shape and propagate it with random samples from a uniform distribution over [0, 1).
Parameters : d0, d1, ..., dn : int, optional
The dimensions of the returned array, should all be positive. If no argument is given a single Python float is returned.
Returns : out : ndarray, shape (d0, d1, ..., dn)
Random values.
See also
random
Notes
This is a convenience function. If you want an interface that takes a shape-tuple as the first argument, refer to np.random.random_sample .
Examples
>>> np.random.rand(3,2) array([[ 0.14022471, 0.96360618], #random [ 0.37601032, 0.25528411], #random [ 0.49313049, 0.94909878]]) #random
likelihoods Module¶
- hddm.likelihoods.add_quantiles_functions_to_pymc_class(pymc_class)¶
add quantiles methods to a pymc class Input:
pymc_class <class>
- hddm.likelihoods.general_WienerCont(err=0.0001, n_st=2, n_sz=2, use_adaptive=1, simps_err=0.001)¶
- hddm.likelihoods.generate_wfpt_stochastic_class(wiener_params=None, sampling_method='cdf', cdf_range=(-5, 5), sampling_dt=0.0001)¶
create a wfpt stochastic class by creating a pymc nodes and then adding quantile functions. Input:
wiener_params <dict> - dictonary of wiener_params for wfpt likelihoods sampling_method <string> - an argument used by hddm.generate.gen_rts cdf_range <sequance> - an argument used by hddm.generate.gen_rts sampling_dt <float> - an argument used by hddm.generate.gen_rts- Ouput:
- wfpt <class> - the wfpt stochastic
- hddm.likelihoods.wiener_like_contaminant(value, cont_x, v, sv, a, z, sz, t, st, t_min, t_max, err, n_st, n_sz, use_adaptive, simps_err)¶
Log-likelihood for the simple DDM including contaminants
utils Module¶
- hddm.utils.EZ(pc, vrt, mrt, s=1)¶
Calculate Wagenmaker’s EZ-diffusion statistics.
Parameters : - pc : float
probability correct.
- vrt : float
variance of response time for correct decisions (only!).
- mrt : float
mean response time for correct decisions (only!).
- s : float
scaling parameter. Default s=1.
Returns : - (v, a, ter) : tuple
drift-rate, threshold and non-decision time
The error RT distribution is assumed identical to the correct RT distrib.
Edge corrections are required for cases with Pc=0 or Pc=1. (Pc=.5 is OK)
Assumptions : - The error RT distribution is identical to the correct RT distrib.
- z=.5 – starting point is equidistant from the response boundaries
- sv=0 – across-trial variability in drift rate is negligible
- sz=0 – across-trial variability in starting point is negligible
- st=0 – across-trial range in nondecision time is negligible
Reference : Wagenmakers, E.-J., van der Maas, H. Li. J., & Grasman, R. (2007).
An EZ-diffusion model for response time and accuracy. Psychonomic Bulletin & Review, 14 (1), 3-22.
Example : >>> EZ(.802, .112, .723, s=.1) (0.099938526231301769, 0.13997020267583737, 0.30002997230248141)
See Also: EZ_data
- hddm.utils.EZ_data(data, s=1)¶
Calculate Wagenmaker’s EZ-diffusion statistics on data.
Arguments : - data : numpy.array
Data array with reaction time data. Correct RTs are positive, incorrect RTs are negative.
- s : float
Scaling parameter (default=1)
Returns : - (v, a, ter) : tuple
drift-rate, threshold and non-decision time
See Also: EZ
- hddm.utils.data_quantiles(data, quantiles=(0.1, 0.3, 0.5, 0.7, 0.9))¶
compute the quantiles of 2AFC data
- Output:
- q_lower - lower boundary quantiles q_upper - upper_boundary_quantiles p_upper - probability of hitting the upper boundary
- hddm.utils.flip_errors(data)¶
Flip sign for lower boundary responses.
Arguments : - data : numpy.recarray
Input array with at least one column named ‘RT’ and one named ‘response’
Returns : - data : numpy.recarray
Input array with RTs sign flipped where ‘response’ == 0
- hddm.utils.gen_ppc_stats(quantiles=(10, 30, 50, 70, 90))¶
Generate default statistics for posterior predictive check on RT data.
Returns : OrderedDict mapping statistic name -> function
- hddm.utils.hddm_parents_trace(model, obs_node, idx)¶
Return the parents’ value of an wfpt node in index ‘idx’ (the function is used by ppd_test)
- hddm.utils.histogram(a, bins=10, range=None, normed=False, weights=None, density=None)¶
Compute the histogram of a set of data.
Parameters : a : array_like
Input data. The histogram is computed over the flattened array.
bins : int or sequence of scalars, optional
If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.
range : (float, float), optional
The lower and upper range of the bins. If not provided, range is simply (a.min(), a.max()). Values outside the range are ignored.
normed : bool, optional
This keyword is deprecated in Numpy 1.6 due to confusing/buggy behavior. It will be removed in Numpy 2.0. Use the density keyword instead. If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that this latter behavior is known to be buggy with unequal bin widths; use density instead.
weights : array_like, optional
An array of weights, of the same shape as a. Each value in a only contributes its associated weight towards the bin count (instead of 1). If normed is True, the weights are normalized, so that the integral of the density over the range remains 1
density : bool, optional
If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function. Overrides the normed keyword if given.
Returns : hist : array
The values of the histogram. See normed and weights for a description of the possible semantics.
bin_edges : array of dtype float
Return the bin edges (length(hist)+1).
See also
histogramdd, bincount, searchsorted, digitize
Notes
All but the last (righthand-most) bin is half-open. In other words, if bins is:
[1, 2, 3, 4]
then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.
Examples
>>> np.histogram([1, 2, 1], bins=[0, 1, 2, 3]) (array([0, 2, 1]), array([0, 1, 2, 3])) >>> np.histogram(np.arange(4), bins=np.arange(5), density=True) (array([ 0.25, 0.25, 0.25, 0.25]), array([0, 1, 2, 3, 4])) >>> np.histogram([[1, 2, 1], [1, 0, 1]], bins=[0,1,2,3]) (array([1, 4, 1]), array([0, 1, 2, 3]))
>>> a = np.arange(5) >>> hist, bin_edges = np.histogram(a, density=True) >>> hist array([ 0.5, 0. , 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. , 0.5]) >>> hist.sum() 2.4999999999999996 >>> np.sum(hist*np.diff(bin_edges)) 1.0
- hddm.utils.plot_posterior_quantiles(model, **kwargs)¶
Plot posterior predictive quantiles.
Arguments : model : HDDM model
Optional : - value_range : numpy.ndarray
Range over which to evaluate the CDF.
- samples : int (default=10)
Number of posterior samples to use.
- alpha : float (default=.75)
Alpha (transparency) of posterior quantiles.
- hexbin : bool (default=False)
Whether to plot posterior quantile density using hexbin.
- data_plot_kwargs : dict (default=None)
Forwarded to data plotting function call.
- predictive_plot_kwargs : dict (default=None)
Forwareded to predictive plotting function call.
- columns : int (default=3)
How many columns to use for plotting the subjects.
- save : bool (default=False)
Whether to save the figure to a file.
- path : str (default=None)
Save figure into directory prefix
- hddm.utils.plot_posteriors(model, **kwargs)¶
Generate posterior plots for each parameter.
This is a wrapper for pymc.Matplot.plot()
- hddm.utils.post_pred_check(model, **kwargs)¶
Run posterior predictive check on a model.
Arguments : - model : kabuki.Hierarchical
Kabuki model over which to compute the ppc on.
Optional : - samples : int
How many samples to generate for each node.
- bins : int
How many bins to use for computing the histogram.
- evals : dict
User-defined evaluations of the statistics (by default 95 percentile and SEM). :Example: {‘percentile’: scoreatpercentile}
- plot : bool
Whether to plot the posterior predictive distributions.
- progress_bar: bool
Display progress bar while sampling.
Returns : Hierarchical pandas.DataFrame with the different statistics.
- hddm.utils.qp_plot(model, quantiles=(0.1, 0.3, 0.5, 0.7, 0.9), ncols=None)¶
qp plot Input:
model : HDDM model
- quantiles : sequence
- sequence of quantiles
- ncols : int
- number of columns in output figure