Please read the following information carefully.
This exam consists of 16 multiple choice questions (16 pts), 3 open ended questions (7 pts), and 3 programming questions (9 pts).
The points for the multiple-choice part of the exam are computed as: \[\#correct\_answers - (\#incorrect\_answers / 3)\] This accounts for a 25% guessing correction, corresponding to the four-choice questions we use.
You have 2 hours to complete this exam.
Before you hand in your answers, check that the sheet contains your name and student number, both in the human and computer-readable formats.
The use of the book, notes, calculators or other sources is strictly prohibited.
Note that the order of the letters next to the boxes on your multiple-choice sheet may not always be A-B-C-D! Tip: mark your answers on this exam first, and only after you are certain of your answers, copy them to the multiple-choice answer form.
Read every question properly and in the case of the open questions, give all information requested, which should always include a brief explanation of your answer. Do not however give irrelevant information - this could lead to a deduction of points.
Note that the minimum score per (sub)question is 0 points.
You may write on this exam paper and take it home.
Exam is © 2021 TU Delft.
The command ls -l
lists the file
runTests.py
as follows:
-rwxr-xr-x 1 alice alice 1728 Nov 21 10:32 runTests.py
Which of the following is not correct for the
runTests.py
?
A. The size of the file is 1728 bytes
B. It can only be written by the user
alice
C. It can only be executed by the user
alice
D. Its permissions shortcode is 755
Which of the pipelines lists duplicate line (which are repeated
more than once) in the file file.txt
together with the
number of their occurrences? We assume that a line cannot be repeated by
ten or more times.
Reminder on the options of uniq
command:
-c
prefixes lines by the number of occurrences
-d
only prints duplicate lines, one for each group
A.
sort file.txt | uniq -c | sort -n | tr -s ' ' | grep "^ [1-9]* .*"
B.
sort file.txt | uniq -c | sort -n | tr -s ' ' | grep "^ [2-9]* .*"
C.
sort file.txt | uniq -d | sort -n | tr -s ' ' | grep "^ [1-9]* .*"
D.
sort file.txt | uniq -d | sort -n | tr -s ' ' | grep "^ [2-9]* .*"
What do the following lines evaluate?
val list = List(1,2,3,4)
val unknown = list.foldRight(Nil:List[Int])(_ :: _)
A.
List(4, List(3, List(2, List(1))))
B. List(1,2,3,4)
C. List(4,3,2,1)
D.
List(1, List(2, List(3, List(4))))
Which Spark RDD function does the following signature correspond to? \[ RDD[A].f(z: B, op1: (B, A) \rightarrow B, op2: (B, B) \rightarrow B ) \rightarrow B \]
A. reduce
B. fold
C. aggregate
D. mapPartitions
Which of the following is false about fault-tolerance of Spark?
A. Relies on the immutability of RDDs
B. Uses RDD lineage to identify what needs to be
recomputed
C. Recomputes DAG to reassign the task of the faulty
node
D. There is no need to restart the whole application
from scratch if a worker node fails
Monad
interface in Scala:A.
trait Monad[M[_]] {
def unit[S](a: S) : M[S]
def flatMap[S, T] (m: M[S], f: S => M[T]) : Monad[T]
}
B.
trait Monad[M[_]] {
def unit[S](a: S) : M[S]
def map[T] (m: M[S], f: S => M[T]) : Monad[T]
}
C.
trait Monad[M[_]] {
def map[T] (m: M[S], f: S => M[T]) : Monad[T]
def reduce[T,B](init: B, f: (B,T) => M[B]): Monad[B]
}
D.
trait Monad[M[_]] {
def map[T] (m: M[S], f: S => M[T]) : Monad[T]
def flatMap[S, T] (m: M[S], f: S => M[T]) : T
}
Which of the following statements about Spark is false?
A. Transformations are not executed until an action
is called
B. mapValues
is an example of a function
that requires shuffling
C. Allows the programmer to customize the partitioning
of pair RDDs
D. Lazy transformations enable Spark to run more
efficiently
In a system with 3 nodes A, B, C, six events took place in the following order:
msg1
to Bmsg2
to Amsg1
msg2
msg3
to Cmsg3
Which pair of events are not causally related (i.e., they are concurrent) to each other? (You can compute vector clocks for checking causal dependencies.)
A. (Event 2, Event 4)
B. (Event 3, Event 4)
C. (Event 3, Event 5)
D. (Event 3, Event 6)
Which of these statements about replication architectures is false?
A. With synchronous replication, the write
operations cannot be processed if a follower does not respond
B. Asynchronous replication guarantees that the
followers are up-to-date with the leader
C. Multi-leader replication requires a policy to
resolve conflicting write requests
D. Leaderless replication processes read/write queries
based on a quorum of replicas
Which of these sentences about the Raft consensus algorithm is false?
A. Every Raft term has exactly one leader
B. Raft defines 2 cluster states: “Leader election” and
“Log replication”
C. Raft ensures that only servers with up-to-date logs
can become leader
D. The Raft algorithm works by assuming that the
exchanged messages are valid and true
What is Byzantine fault tolerance?
A. Resilience against multiple node crashes
B. Resilience against multiple message losses
C. Resilience against partitioned network
D. Resilience against malicious nodes
We’re designing a system that replicates a sequence of user commands on a geographically distributed set of nodes. Our system ensures that all previous commands are replicated to all the nodes before processing a new one.
Given that our system runs on an unreliable network, which of the properties cannot always be satisfied by our system?
A. Atomicity
B. Availability
C. Consistency
D. Durability
A GFS chunkserver/HDFS datanode is responsible to:
A. Store filesystem path information
B. Split the data into partitions
C. Store data partitions
D. Replicate the data onto multiple disks
Which of the following is false about Pregel and Bulk synchronization parallel (BSP) model?
A. BSP is an edge-centric approach where the
algorithm iterates over the edges
B. BSP executes in supersteps each of which involves
local computation followed by message sending
C. Supersteps are globally synchronized among all
vertices
D. Pregel is an adaptation of BSP for graph
processing
An advertisement company wants to monitor the sequence of videos viewed by a registered user in one sitting. What type of streaming windows would be the most suitable to analyze the user behavior?
A. Sliding window
B. Tumbling window
C. Global window
D. Session window
What will be produced by the following Flink code snippet:
.map(c => (c.id, 1)).keyBy(x => x._1).sum(1) dataStream
A. Stream of the sums per ID
B. Stream with a single number (the total sum)
C. Stream of increasing integers
D. Runtime Exception
x
whose initial value is 0
. The operations in the history are
labelled with A-G. Is the given execution history linearizable? If yes,
provide a valid linearization. If not, provide an example operation that
violates linearizability.
(3pts) Given the following Spark program, write
the types of the variables a
,
b
, and, c
.
case class Student(sid: Int, sname: String)
case class Enrollment(sid: Int, cid: Int)
val students: RDD[Student] = // reads students from a file into an RDD
val enrollments: RDD[Enrollment] = // reads enrollments from a file into an RDD
val a = students.map(s => (s.sid, s.sname))
val b = enrollments.groupBy(_.sid).mapValues(_.size)
val c = a.join(b)
(2pts) What are wide and narrow transformations in Spark? Describe them with 1-2 sentences and provide an example for each type of transformations.
See the FP practice questions in Weblab.