Skip to content

PartitionedFile

PartitionedFile is a part (block) of a file (similarly to a Parquet block or a HDFS split).

PartitionedFile represents a chunk of a file that will be read, along with partition column values appended to each row, in a partition.

Creating Instance

PartitionedFile takes the following to be created:

PartitionedFile is created when:

Partition Column Values

partitionValues: InternalRow

PartitionedFile is given an InternalRow with the partition column values to be appended to each row.

The partition column values are the values of the partition columns and therefore part of the directory structure not the partitioned files themselves (that together are the partitioned dataset).

Block Hosts

locations: Array[String]

PartitionedFile is given a collection of nodes (host names) with data blocks.

Default: (empty)

String Representation

toString: String

toString is part of the Object (Java) abstraction.


toString is the following text:

path: [filePath], range: [start]-[start+length], partition values: [partitionValues]

Demo

import org.apache.spark.sql.execution.datasources.PartitionedFile
import org.apache.spark.sql.catalyst.InternalRow

val partFile = PartitionedFile(InternalRow.empty, "fakePath0", 0, 10, Array("host0", "host1"))

println(partFile)
path: fakePath0, range: 0-10, partition values: [empty row]