# Summary: Knowledge Management And Business Intelligence

• This + 400k other summaries
• A unique study and practice tool
• Never study anything twice again
• Get the grades you hope for
• 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.

## Read the summary and the most important questions on Knowledge Management and Business Intelligence

• ### 1.1 Preprocessing

This is a preview. There are 10 more flashcards available for chapter 1.1

• #### What are the steps in pre processing?

Identify data sources
Select data
Clean data
Transform data
• #### What type of sample biasses are there?

Sample selection bias: consider the selection mechanism
Seasonality effects: consider the handling of time
• #### How to treat missing values?

1. Remove:
Eliminate rows or columns. But could mean deleting usefull information

2. Replace missing values
- acquire true values: contact, purchase
- imputation techniques: replace by mean, prediction

3. Keep!
- add variable called missing, or introduce dummy

4. Weight-of-evidence
• #### How to detect and treat outliers?

Z = (x - mean) / st.dev
If Z is > 3 it could be an outlier

Reduce impact by keeping the max value at z=3? Replace with 99% percentile

Multivariate outliers: if multiple dimensions are considered simultaneously. Often just ignore them
• #### What is feature engineering?

Enrich data set as to increase predictive performance

For instance: time-flattening: removing the time dimension by defining features that summarize performance period.
Or transforming from unstructured to structured data.
• #### What is variable transformation?

Normalization: rescale variables to typically [0,1]
Standardisation: rescale data to have a mean zero and st.dev of one.
Transformation: to a normal distribution

Advanced transformations: Box-Cox, Yeo-Johnson, Principle Component Analysis
• #### How to handle course classifications?

Pivotting tables and regrouping in order to create more distinction. Done via the Chi-squared test. The bigger its value, the better.
• #### Why change a continuous variable to categorical?

Interpretability: some prefer age segments

Allows to incorporate non-linear relations within a linear model. And thus improve perfromance

Sometimes for anonymization, or different applications.
• #### How does weight-of-evidence work?

Why take the ln of the "relative odds" and not the absolute odds? This way WOE is independant of class distribution and permits easy interpretation.

Information Value: IV = Sum(distr.good.cat - distr.bad.cat)*woe.cat)

Category boundaries can be given so as to maximise the predictive powers in terms of IV

# of categories is a trade-off: fewer is simpler. More is to keep predictive power.

Binning: questions wether its with or without interaction
• #### What are the pros and cons of WOE?

All-in-one solution:
- categorical to continuous
- continuous to categorical
- missing values
- outliers
- assessment of predictive strength
- nonlinear relations in a linear, interpretable model

Drawbacks: some loss of predictive power?