Data preprocessing

This operation had another name (Data Cleaning or Data Wragling). which is a process of convert initial raw or mapping data to another format to prepare the data for further analysis.

The main objectives of data preprocessing are:

Identify and handle missing values
Data Formatting
Data Normalization(centring, scale)
Data Binning ( which create bigger categories for numerical values).
Convert Categorical values into Numerical variables.

this operation is handle with a column which is a pandas series, each row named sample.
to call specific column we just type :

df["column_name"]

Deals with missing value in Python

missing data is when there is no data store for the variable and it could be represented as ( "?", "NaN", 0 or just blank cell). whenever we see missing data we have two option the

Contact with the person who has the data source and try to find the missing value
Drop ( variable or data entry)
Replace the missing data ( by taking the average with similar datapoint for numerical values) or (replace it frequently with categorical data)
Use specific function
leave it as missing data.

Dropna : is python library to drop column or row

axis 0 drop entire row

axis 1 drop entire column

Take the mean mean=df["price"].mean()

Drop Missing Value df.dropna(subset=["price"], axis=0,inplace=True)

Replace Missing Value
df["price"].replace(np.nan,mean)

Junior 4 Data Scientist

Comments

Flickr

Sponsor

Labels

Blog Archive

Data preprocessing

The main objectives of data preprocessing are:

Deals with missing value in Python

Take the mean `mean=df["price"].mean()`
Drop Missing Value `df.dropna(subset=["price"], axis=0,inplace=True)`
Replace Missing Value
`df["price"].replace(np.nan,mean)`

About Inas AL-Kamachy

0 Comments:

Post a Comment

Recent comments

Flickr

Sponsor

Labels

Blog Archive

Data preprocessing

The main objectives of data preprocessing are:

Deals with missing value in Python

Take the mean mean=df["price"].mean()Drop Missing Value df.dropna(subset=["price"], axis=0,inplace=True)Replace Missing Value df["price"].replace(np.nan,mean)

About Inas AL-Kamachy

RELATED POSTS

0 Comments:

Post a Comment

Take the mean `mean=df["price"].mean()`
Drop Missing Value `df.dropna(subset=["price"], axis=0,inplace=True)`
Replace Missing Value
`df["price"].replace(np.nan,mean)`