HadoopFileLinesReader¶
HadoopFileLinesReader
is a Scala Iterator of Apache Hadoop's org.apache.hadoop.io.Text.
HadoopFileLinesReader
is <
SimpleTextSource
LibSVMFileFormat
TextInputCSVDataSource
TextInputJsonDataSource
TextFileFormat
HadoopFileLinesReader
uses the internal <
Creating Instance¶
HadoopFileLinesReader
takes the following when created:
- [[file]] PartitionedFile
- [[conf]] Hadoop's
Configuration
=== [[iterator]] iterator
Internal Property
[source, scala]¶
iterator: RecordReaderIterator[Text]¶
When <HadoopFileLinesReader
creates an internal iterator
that uses Hadoop's https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapreduce/lib/input/FileSplit.html[org.apache.hadoop.mapreduce.lib.input.FileSplit] with Hadoop's https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/fs/Path.html[org.apache.hadoop.fs.Path] and <
iterator
creates Hadoop's TaskAttemptID
, TaskAttemptContextImpl
and LineRecordReader
.
iterator
initializes LineRecordReader
and passes it on to a RecordReaderIterator.
NOTE: iterator
is used for Iterator
-specific methods, i.e. hasNext
, next
and close
.