pyspark.sql.DataFrame.semanticHash#

DataFrame.semanticHash()[source]#

Returns a hash code of the logical query plan against this DataFrame.

New in version 3.1.0.

Changed in version 3.5.0: Supports Spark Connect.

Returns
int

Hash value.

Notes

Unlike the standard hash code, the hash is calculated against the query plan simplified by tolerating the cosmetic differences such as attribute names.

This API is a developer API.

Examples

>>> spark.range(10).selectExpr("id as col0").semanticHash()  
1855039936
>>> spark.range(10).selectExpr("id as col1").semanticHash()  
1855039936