sklift.datasets.fetch_hillstrom
- sklift.datasets.datasets.fetch_hillstrom(target_col='visit', data_home=None, dest_subdir=None, download_if_missing=True, return_X_y_t=False)[source]
Load and return Kevin Hillstrom Dataset MineThatData (classification or regression).
This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.
Major columns:
visit
(binary): target. 1/0 indicator, 1 = Customer visited website in the following two weeks.conversion
(binary): target. 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.spend
(float): target. Actual dollars spent in the following two weeks.segment
(str): treatment. The e-mail campaign the customer received
Read more in the docs.
- Parameters
target_col (string, 'visit' or 'conversion', 'spend' or 'all', default='visit') – Selects which column from dataset will be target
data_home (str) – The path to the folder where datasets are stored.
dest_subdir (str) – The name of the folder in which the dataset is stored.
download_if_missing (bool) – Download the data if not present. Raises an IOError if False and data is missing.
return_X_y_t (bool, default=False) – If True, returns (data, target, treatment) instead of a Bunch object.
- Returns
dataset.
- Bunch:
By default dictionary-like object, with the following attributes:
data
(DataFrame object): Dataset without target and treatment.target
(Series or DataFrame object): Column target by values.treatment
(Series object): Column treatment by values.DESCR
(str): Description of the Hillstrom dataset.feature_names
(list): Names of the features.target_name
(str or list): Name of the target.treatment_name
(str): Name of the treatment.
- Tuple:
tuple (data, target, treatment) if return_X_y is True
- Return type
Bunch or tuple
References
https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html
Example:
from sklift.datasets import fetch_hillstrom dataset = fetch_hillstrom(target_col='visit') data, target, treatment = dataset.data, dataset.target, dataset.treatment # alternative option data, target, treatment = fetch_hillstrom(target_col='visit', return_X_y_t=True)
See also
fetch_lenta()
: Load and return the Lenta dataset (classification).fetch_x5()
: Load and return the X5 RetailHero dataset (classification).fetch_criteo()
: Load and return the Criteo Uplift Prediction Dataset (classification).fetch_megafon()
: Load and return the MegaFon Uplift Competition dataset (classification)
Kevin Hillstrom Dataset: MineThatData
Data description
This is a copy of MineThatData E-Mail Analytics And Data Mining Challenge dataset.
This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.
1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise.
1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise.
1/3 were randomly chosen to not receive an e-mail campaign.
During a period of two weeks following the e-mail campaign, results were tracked. Your job is to tell the world if the Mens or Womens e-mail campaign was successful.
Fields
Historical customer attributes at your disposal include:
Recency: Months since last purchase.
History_Segment: Categorization of dollars spent in the past year.
History: Actual dollar value spent in the past year.
Mens: 1/0 indicator, 1 = customer purchased Mens merchandise in the past year.
Womens: 1/0 indicator, 1 = customer purchased Womens merchandise in the past year.
Zip_Code: Classifies zip code as Urban, Suburban, or Rural.
Newbie: 1/0 indicator, 1 = New customer in the past twelve months.
Channel: Describes the channels the customer purchased from in the past year.
Another variable describes the e-mail campaign the customer received:
Segment
Mens E-Mail
Womens E-Mail
No E-Mail
Finally, we have a series of variables describing activity in the two weeks following delivery of the e-mail campaign:
Visit: 1/0 indicator, 1 = Customer visited website in the following two weeks.
Conversion: 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.
Spend: Actual dollars spent in the following two weeks.
Key figures
Format: CSV
Size: 433KB (compressed) 4,935KB (uncompressed)
Rows: 64,000
Response Ratio:
Average visit Rate: .15,
Average conversion Rate: .009,
the values in the spend column are unevenly distributed from 0.0 to 499.0
Treatment Ratio: The parts are distributed evenly between the three classes
About Hillstrom
The dataset was provided by Kevin Hillstorm. Kevin is President of MineThatData, a consultancy that helps CEOs understand the complex relationship between Customers, Advertising, Products, Brands, and Channels.
Link to the blog: https://blog.minethatdata.com/