Me

My photo
Developer, Data Enthusiast, DFW

Sunday, March 27, 2016

Converting All Unix Time Stamp to DateTime in a DataFrame in One Run!

While Data Munging for my Weather Data Analysis blog, I skipped out a bunch of other valuable data which were all related converting to date_time. I managed to come across a solution today while playing with Pandas' .apply() and .applymap() functions.

Here's how the raw data looked like prior human readable date time conversion. 

To the non-developers, this 9 digit is also known as Unix Timestamp. Well what is Unix? It's an operating system, very similar to what's running in Mac.. In many systems, Unix Timestamp is the way to go for distributing date data because this is a non-expensive way of interpreting and also a safe way to distribute date information. There's a lot of ways to interpret dates such as:

  • 2013-12-31T 15:25:46Z
  • 12/31/2013 3:25.46.00
  • December 31, 2013 3:25.46pm
The ones I've shown below is just off of my head. Having worked with systems for awhile, horrible coding that doesn't parse dates really well can easily trip up from unexpected inputs. Furthermore, which region is this date time interpreting from? PST? GMT? EST?Unix Timestamp solves all of that because it is the difference of the current time from the epoch time of January 1st, 1970 UTC. More explained from the Wiki.


Now I have this set of data, there are only couple of columns that I want to actually convert to date time. I'm too lazy to convert one column at a time, so let's create  functions that can be applied to all the columns.

In this Tutorial, I'll show you how to create two functions that automatically figures out which columns are Unix Time stamp and then using .applymap() to use out of the box pd.to_datetime() functions to it. 


Below is the first function to figure out if a given input is a number or a string.

def is_numeric(obj):
    attrs = ['__add__', '__sub__', '__mul__', '__div__', '__pow__']
    return all(hasattr(obj, attr) for attr in attrs)

In Python, everything is an object. Particularly for numeric objects, which should have attributes to do them addition, subtraction, multiplication, division and exponentiation. A way to look at an object of what kind of attributes it has is using a dir() function.



Though, you can see only  '__add__' and '__div__', in this screen shot, feel free to try the code above in Python and you should be able to see the rest of the attributes I mentioned in the is_numeric() function defined above.



For Strings, it will have '__add__' and '__mul__', but that's it.


The next function is to figure out if our data is Unix Time stamp, or some other irrelevant numeric data such as the 'apparentTemperatureMax' column. The first thing I noticed is the number of characters in Unix Time stamp is a lot longer.

Timestamps of 10 characters has information all the way to the seconds. 11 characters are milli-seconds, and so on...

Combining all of those information above, below is the code. I'm going to use .applymap() which does the column checks and performs pd.date_time() conversion onto every column.




The .apply() function is meant for 1 column of data. Below is an example:


Code Snippet below in case you want to copy and paste. Please give me credit if done so!
def is_numeric(obj):
    attrs = ['__add__', '__sub__', '__mul__', '__div__', '__pow__']
    return all(hasattr(obj, attr) for attr in attrs)

#Example of MAP
def convert_dt(x):
    if is_numeric(x) and len(str(x))==10: #condition checks is the returned cell is numeric and has len of 10 so it guarantees we're converting date numbers only
        return pd.to_datetime(x, unit='s')
    elif is_numeric(x) and len(str(x))==9: #Change the amount of Length and which unit for pd.to_datetime accordingly
        return pd.to_datetime(x, unit='s')
    else:
        return x
daily_df.applymap(convert_dt)

ShareIt

Blogs I like