Apache Spark, Parquet, and Troublesome Nulls

This is a post that I wrote a month prior and decided to post to Medium for whatever reason. It covers the dangers of column nullability in Spark DataFrames and focuses on DataFrames from Apache Parquet read and write operations. As always I hope it is enlightening and if you see any errors, I would love to discuss it with you.


Written on December 19, 2017