Python,s one of the most famous interfaces is pyspark. It is used to write spark applications using python API. It also allows pyspark to interactively analyze your data. Pyspark is the pythons API for apache spark ( an open source for distributing and processing big data workload)
Convert string to date in Pyspark
Two methods are available to convert string to date in pyspark
- Convert string to date in Pyspark with the to_date function.
- Convert string to date in Pyspark in SQL.
In the first method, you will have to convert string to date in Pyspark using the to_date function only if you have a data frame with a string column
You can find from the name that the to_date function(column, format) in pyspark allows you to convert the string format in a data model to data format. For example, YYYY-MM-DD.this function determines string values as the input function and date patterns. the first argument is the string values and the second argument is the date patterns.
The syntax in the date pattern looks like to_date(col(“string_column_name”), “YYYY-MM-DD”).when using these patterns for the practical data frame, you need to import the function for the string to date conversion.
Select from pyspark.sql.functions import * and after that do df2 = df1.select(col(“column_name”),to_date(col(“column_name”),”YYYY-MM-DD”).alias(“to_date”)). The output for these commands will be df2.show().
The meaning of the above syntaxes are
- Df1: it is the data frame or data model to convert string to date in pyspark.Df2: this data model was created to convert string to date in pyspark.
- To_date: this is used for conversation purpose of string to date
- YYYY-MM-DD: this is a data format, it can be MM-DD-YYYY or DD-YYYY-MM
- Alias: this function adds a particular for the column or table so it may become shorter and easily readable.
Some of the simple common data frames are
df1=spark.createDataFrame(
data = [ (“1″,”Angela”,”2018-18-07 14:01:23.000″),(“2″,”Amandy”,”2018-21-07 13:04:29.000″),(“3″,”Michelle”,”2018-24-07 06:03:13.009″)],
schema=[“Id”,”CustomerName”,”timestamp”])
df1.printSchema()
the time stamp, in this case, is YYYY-MM-DD. We will now convert it ta a data column. The timestamped is date converted.
For the conversion, we write from pyspark.sql.functions import *
df2 = df1.select(col(“timestamp”),to_date(col(“timestamp”),”YYYY-MM-DD”).alias(“to_date”))
df2.show().
The coding has now been converted to the strings in to_date in a new table. It is given below
+———-+———-+
input | to_date|
+———-+———-+
|2018-18-07|2018-07-18|
|2018-21-07|2023-07-21|
|2018-24-07|2023-07-24|
+———-+———-+
In the second method, we have to try SQL. We will first convert the timestamp to angela.
spark.sql(“select to_date(‘2018-18-07′,’YYYY-MM-DD’) to_date”)
.show()
The string will be converted to YYYY-MM-DD, and its table will be
+———-+———-+
| input| to_date|
+———-+———-+
|2018-18-07|2018-07-18|
From the above results, it is concluded that by using the above codes we can convert string to date in pyspark in SQL pyspark.
Conclusion
From the above two methods, we can convert the string to date in pyspark. But it is recommended to use the first method which is the to_date function method because it is simple and mostly used for conversion.
ITtutoria
ITtutoria is a tutorial platform for developers where you can find tutorials for almost every software available on the internet. It is mostly used by developers for learning coding software like python, pyspark, apache-spark, etc.
On ITtutoria tutorials are present on how to convert string to date in pyspark with to_date function or in SQL pyspark.
In ITtutoria there are answers and methods of how to create pandas data framing python. It also gives you tutorials on how to use pyspark and its coding methods.