This operation had another name (Data Cleaning or Data Wragling). which is a process of convert initial raw or mapping data to another format to prepare the data for further analysis.
The main objectives of data preprocessing are:
- Identify and handle missing values
- Data Formatting
- Data Normalization(centring, scale)
- Data Binning ( which create bigger categories for numerical values).
- Convert Categorical values into Numerical variables.
this operation is handle with a column which is a pandas series, each row named sample.
to call specific column we just type :
df["column_name"]
Deals with missing value in Python
missing data is when there is no data store for the variable and it could be represented as ( "?", "NaN", 0 or just blank cell). whenever we see missing data we have two option the
- Contact with the person who has the data source and try to find the missing value
- Drop ( variable or data entry)
- Replace the missing data ( by taking the average with similar datapoint for numerical values) or (replace it frequently with categorical data)
- Use specific function
- leave it as missing data.
Dropna : is python library to drop column or row
axis 0 drop entire row
axis 1 drop entire column
- Take the mean
mean=df["price"].mean()
- Drop Missing Value
df.dropna(subset=["price"], axis=0,inplace=True)
- Replace Missing Value
df["price"].replace(np.nan,mean)
mean=df["price"].mean()
df.dropna(subset=["price"], axis=0,inplace=True)
df["price"].replace(np.nan,mean)
0 Comments:
Post a Comment