streamline.utils.dataset module
- class streamline.utils.dataset.Dataset(dataset_path, class_label, match_label=None, instance_label=None)[source]
Bases:
objectCreates dataset with path of tabular file
- Parameters:
dataset_path – path of tabular file (as csv, tsv, or txt)
class_label – column label for the outcome to be predicted in the dataset
match_label – column to identify unique groups of instances in the dataset that have been ‘matched’ as part of preparing the dataset with cases and controls that have been matched for some co-variates Match label is really only used in the cross validation partitioning It keeps any set of instances with the same match label value in the same partition.
instance_label – Instance label is mostly used by the rule based learner in modeling, we use it to trace back heterogeneous subgroups to the instances in the original dataset
- clean_data(ignore_features)[source]
Basic data cleaning: Drops any instances with a missing outcome value as well as any features (ignore_features) specified by user
- feature_only_data()[source]
Create features-only version of dataset for some operations Returns: dataframe x_data with only features