Here's how the raw data looked like prior human readable date time conversion.
To the non-developers, this 9 digit is also known as Unix Timestamp. Well what is Unix? It's an operating system, very similar to what's running in Mac.. In many systems, Unix Timestamp is the way to go for distributing date data because this is a non-expensive way of interpreting and also a safe way to distribute date information. There's a lot of ways to interpret dates such as:
- 2013-12-31T 15:25:46Z
- 12/31/2013 3:25.46.00
- December 31, 2013 3:25.46pm
Now I have this set of data, there are only couple of columns that I want to actually convert to date time. I'm too lazy to convert one column at a time, so let's create functions that can be applied to all the columns.
In this Tutorial, I'll show you how to create two functions that automatically figures out which columns are Unix Time stamp and then using .applymap() to use out of the box pd.to_datetime() functions to it.
Below is the first function to figure out if a given input is a number or a string.
def is_numeric(obj):
attrs = ['__add__', '__sub__', '__mul__', '__div__', '__pow__']
return all(hasattr(obj, attr) for attr in attrs)
In Python, everything is an object. Particularly for numeric objects, which should have attributes to do them addition, subtraction, multiplication, division and exponentiation. A way to look at an object of what kind of attributes it has is using a dir() function.
For Strings, it will have '__add__' and '__mul__', but that's it.
The next function is to figure out if our data is Unix Time stamp, or some other irrelevant numeric data such as the 'apparentTemperatureMax' column. The first thing I noticed is the number of characters in Unix Time stamp is a lot longer.
Timestamps of 10 characters has information all the way to the seconds. 11 characters are milli-seconds, and so on...
Combining all of those information above, below is the code. I'm going to use .applymap() which does the column checks and performs pd.date_time() conversion onto every column.
The .apply() function is meant for 1 column of data. Below is an example:
Code Snippet below in case you want to copy and paste. Please give me credit if done so!
def is_numeric(obj):
attrs = ['__add__', '__sub__', '__mul__', '__div__', '__pow__']
return all(hasattr(obj, attr) for attr in attrs)
#Example of MAP
def convert_dt(x):
if is_numeric(x) and len(str(x))==10: #condition checks is the returned cell is numeric and has len of 10 so it guarantees we're converting date numbers only
return pd.to_datetime(x, unit='s')
elif is_numeric(x) and len(str(x))==9: #Change the amount of Length and which unit for pd.to_datetime accordingly
return pd.to_datetime(x, unit='s')
else:
return x
daily_df.applymap(convert_dt)
