SmartDrill Home

   
   Search the SmartDrill site

 
 

 

Securities Brokerage CHAID Model



 

Note: if you are not highly familiar with CHAID segmentation modeling techniques, we suggest that you first visit our Introduction to CHAID page, and then return here to continue reading the brokerage case study.

 

Introduction

This case study documents a predictive market segmentation model designed to identify and profile high-value brokerage customer segments as targets for special marketing communications efforts to encourage use of the brokerage firm's online investment services. The dependent variable for this ordinal CHAID model is online brokerage account commission dollars during the past 12 months. Predictors include proprietary client data (various account status and trading behavior variables) as well as syndicated demographic and lifestyle variables.

We begin by splitting the client's entire customer file into a modeling sample and a validation sample. (Once the model is built using the modeling sample, we apply it to the validation sample to see how well it works on a sample other than the one on which it was built).

 

CHAID Segmentation Model

CHAID (Chi-square Automatic Interaction Detection) is a predictive, tree-based segmentation technique that uses a dependent or criterion variable as the seed for the segmentation.  The tree is built by splitting an initial predictor variable into categories or ranges of values such that each category or range of values represents a statistically significant difference on the dependent variable.  Additional predictors may be added under each category or value range of the first predictor, adding more branches to the segmentation model.  A stopping algorithm determines when the model is complete, and the final “twigs” of the segmentation tree represent the final segments.

Such a tree can be developed in automatic tree-generation mode, or it can be handcrafted by manually selecting predictors to use at various levels of the tree.  Often a tree is first generated automatically; then, based on the specific business objectives of the analysis, as well as knowledge of the product/service category being analyzed, the tree is manually revised.

In our example, the resulting CHAID model has 55 segments. [NOTE: while a typical CHAID segmentation model might optimally contain perhaps 12-15 segments, occasionally it is useful to develop a more extensive, more granular segmentation, which can sometimes help us identify and understand important “niche” segments that represent either a significant problem or an unusually lucrative opportunity. A more extensive segmentation model also allows us to take finer cuts at a marketing database.  This can yield targeting precision that is functionally equivalent to that achieved by regression modeling, while also giving us a market segmentation that regression techniques are unable to provide.]

For reasons of confidentiality, we will not display the segmentation tree diagram here. However, the results are summarized in the following comb chart, showing the segment indexes (indexes of average dollar value), and in the associated Gains table, farther down.

 

CHAID segments bar graph showing annual commission dollars by segment

This summary comb chart is a quick way to confirm that the CHAID segmentation model discriminates degrees of customer value quite well. The top segment has an index of just over 900, reflecting an average annual commission value of just over $1,000. In contrast, the bottom segment has an index of just two, reflecting an average annual commission of less than three dollars.

Next is the gains table, which provides quantitative detail useful for financial and marketing planning. In the gains table, we have highlighted the top 20% of the file in blue, the remaining above-average segments in green, the bottom 20% of the file in bright red the remaining below-average segments in dark red. Among other things, we can see that the top 20% of the file is worth an average of about $334 per account, which is nearly three times the average account value for the entire sample.

CHAID Gains table: Average Annual Brokerage Commission Dollars

Basic Segment Statistics

Cumulative Statistics

Seg. Number

Seg. Size

Percent of all

Avg. $ value

Index of avg. $ value

Cum. size

Cum. % of all

Cum. avg. $ value

Cum. index of avg. $ value

55

545

1.4

1025

907

545

1.4

1025

907

43

417

1.1

862

763

962

2.5

954

845

54

469

1.2

604

534

1,431

3.7

839

743

53

449

1.2

359

318

1,880

4.9

725

641

42

617

1.6

354

313

2,497

6.5

633

560

51

467

1.2

263

233

2,964

7.7

575

509

2

382

1

259

230

3,346

8.7

539

477

40

556

1.4

257

228

3,902

10.2

499

441

46

347

0.9

204

181

4,249

11.1

475

420

29

927

2.4

174

154

5,176

13.5

421

372

11

658

1.7

160

142

5,834

15.2

391

346

35

484

1.3

159

141

6,318

16.4

373

331

1

924

2.4

153

135

7,242

18.8

345

306

16

439

1.1

147

130

7,681

20

334

296

52

866

2.3

134

119

8,547

22.2

314

278

44

476

1.2

132

117

9,023

23.5

304

269

39

360

0.9

129

115

9,383

24.4

297

263

3

966

2.5

127

112

10,349

26.9

282

249

7

807

2.1

125

111

11,156

29

270

239

38

725

1.9

111

98

11,881

30.9

261

231

14

1,081

2.8

97.13

86

12,962

33.7

247

219

32

1,123

2.9

96.27

85

14,085

36.7

235

208

28

583

1.5

94.31

83

14,668

38.2

229

203

41

339

0.9

93.26

83

15,007

39.1

226

200

8

842

2.2

93.14

82

15,849

41.3

219

194

48

374

1

90.56

80

16,223

42.2

216

191

25

760

2

84.94

75

16,983

44.2

210

186

34

627

1.6

84.68

75

17,610

45.8

206

182

6

920

2.4

66.48

59

18,530

48.2

199

176

36

1,363

3.5

57.97

51

19,893

51.8

189

168

4

384

1

53.51

47

20,277

52.8

187

165

12

2,314

6

50.36

45

22,591

58.8

173

153

21

676

1.8

50.28

44

23,267

60.6

169

150

18

2,151

5.6

46.43

41

25,418

66.2

159

141

13

498

1.3

45.85

41

25,916

67.5

157

139

24

674

1.8

45.04

40

26,590

69.2

154

136

9

906

2.4

43.81

39

27,496

71.6

150

133

33

605

1.6

40.88

36

28,101

73.1

148

131

47

386

1

39.57

35

28,487

74.1

146

130

22

491

1.3

37.76

33

28,978

75.4

145

128

45

458

1.2

37.56

33

29,436

76.6

143

126

30

391

1

31.35

28

29,827

77.6

141

125

26

562

1.5

27.84

25

30,389

79.1

139

123

10

763

2

27.59

24

31,152

81.1

137

121

15

305

0.8

24.64

22

31,457

81.9

135

120

49

617

1.6

24.54

22

32,074

83.5

133

118

5

321

0.8

23.78

21

32,395

84.3

132

117

31

432

1.1

23.44

21

32,827

85.4

131

116

23

336

0.9

15.95

14

33,163

86.3

130

115

17

632

1.6

15.66

14

33,795

88

128

113

20

396

1

12.36

11

34,191

89

126

112

37

1,071

2.8

9.2

8

35,262

91.8

123

109

27

2,203

5.7

6.39

6

37,465

97.5

116

102

50

578

1.5

3.27

3

38,043

99

114

101

19

377

1

2.33

2

38,420

100

113

100



Additional Analyses

We can use the data in the gains table to perform various financial calculations. For example, by multiplying the size of one or more segments by the average segment dollar value, we get a total value. Using this information, we can better plan our communications/promotion budget. Similar calculations performed on the under-performing segments provide information about potential cost savings achieved by reducing marketing efforts directed at these under-performers.

We can perform additional diagnostics on the most lucrative segments to identify those customers who are below average for their segment. This means that they have the same or similar characteristics as their better-performing cohorts on the model's predictive variables, but are not providing as high a level of commission dollars. By using the demographic, lifestyle and trading behavior variables to define them, we can develop marketing and advertising communications strategies tailored to them, the goal being to convince them to trade at levels similar to their segment’s more lucrative counterparts.

There are many other decisions which the gains table and the segmentation rules can help us make. For example, we might wish to conduct some market research among customers in under-performing segments, or among under-performing customers in the better segments. We can use the variables that comprise the segment definitions to help us identify possible issues and question areas to include in the survey.


CHAID Model Validation

Before we try to apply a CHAID model, we perform a validation against a holdout sample, to confirm that it is a good model. In this example, our database was split randomly into equal-sized modeling and validation samples. We can reconstruct the CHAID segmentation model on the validation sample, and examine the results. For example, when we perform correlation analyses on the segments from the two samples, we obtain a correlation of approximately 0.98 for the segment sizes and approximately .97 for  the segments' average dollar value, indicating a very high degree of correspondence.  

Back to Top

Back to Marketing Analytics page 

 

· Marketing Analytics 
· Market Research
· Operations Research
· Risk/Decision Analysis
· Project Management

         

SSL certification seal from Comodo