Comments

Data preprocessing




This operation had another name (Data Cleaning or Data Wragling). which is a process of convert initial raw or mapping data to another format to prepare the data for further analysis. 


The main objectives of data preprocessing are: 

  • Identify and handle missing values
  • Data Formatting
  • Data Normalization(centring, scale)
  • Data Binning ( which create bigger categories for numerical values). 
  • Convert Categorical values into Numerical variables.

this operation is handle with a column which is a pandas series,  each row named sample.
to call specific column we just type : 

df["column_name"]



Deals with missing value in Python

missing data is when there is no data store for the variable and it could be represented as ( "?", "NaN", 0 or just blank cell). whenever we see missing data we have two option the

  • Contact with the person who has the data source and try to find the missing value
  • Drop ( variable or data entry) 
  • Replace the missing data ( by taking the average with similar datapoint for numerical values) or (replace it frequently with categorical data) 
  • Use specific function
  • leave it as missing data. 

Dropna : is python library to drop column or row 
axis 0 drop entire row
axis 1 drop entire column


  • Take the mean                                   
    mean=df["price"].mean()
  • Drop Missing Value                  
    df.dropna(subset=["price"], axis=0,inplace=True)
  • Replace Missing Value                 
    df["price"].replace(np.nan,mean)











Share on Google Plus

About Inas AL-Kamachy

    Blogger Comment
    Facebook Comment

0 Comments:

Post a Comment