Please read the following information carefully.

Multiple choice questions (16 questions, 16 pts)

 

  1. The command ls -l lists the file runTests.py as follows:

    -rwxr-xr-x 1 alice alice 1728 Nov 21 10:32 runTests.py

    Which of the following is not correct for the runTests.py?

    A. The size of the file is 1728 bytes
    B. It can only be written by the user alice
    C. It can only be executed by the user alice
    D. Its permissions shortcode is 755

 

  1. Which of the pipelines lists duplicate line (which are repeated more than once) in the file file.txt together with the number of their occurrences? We assume that a line cannot be repeated by ten or more times.

    Reminder on the options of uniq command:
    -c prefixes lines by the number of occurrences
    -d only prints duplicate lines, one for each group

    A. sort file.txt | uniq -c | sort -n | tr -s ' ' | grep "^ [1-9]* .*"
    B. sort file.txt | uniq -c | sort -n | tr -s ' ' | grep "^ [2-9]* .*"
    C. sort file.txt | uniq -d | sort -n | tr -s ' ' | grep "^ [1-9]* .*"
    D. sort file.txt | uniq -d | sort -n | tr -s ' ' | grep "^ [2-9]* .*"

 

  1. What do the following lines evaluate?

    val list = List(1,2,3,4)
    val unknown = list.foldRight(Nil:List[Int])(_ :: _)

    A. List(4, List(3, List(2, List(1))))
    B. List(1,2,3,4)
    C. List(4,3,2,1)
    D. List(1, List(2, List(3, List(4))))

 

  1. Which Spark RDD function does the following signature correspond to? \[ RDD[A].f(z: B, op1: (B, A) \rightarrow B, op2: (B, B) \rightarrow B ) \rightarrow B \]

    A. reduce
    B. fold
    C. aggregate
    D. mapPartitions

 

  1. Which of the following is false about fault-tolerance of Spark?

    A. Relies on the immutability of RDDs
    B. Uses RDD lineage to identify what needs to be recomputed
    C. Recomputes DAG to reassign the task of the faulty node
    D. There is no need to restart the whole application from scratch if a worker node fails

 

  1. Choose the correct implementation of the Monad interface in Scala:

A.

      trait Monad[M[_]] {
        def unit[S](a: S) : M[S]
        def flatMap[S, T] (m: M[S], f: S => M[T]) : Monad[T]
      }

B.

        trait Monad[M[_]] {
          def unit[S](a: S) : M[S]
          def map[T] (m: M[S], f: S => M[T]) : Monad[T]
        }

C.

        trait Monad[M[_]] {
          def map[T] (m: M[S], f: S => M[T]) : Monad[T]
          def reduce[T,B](init: B, f: (B,T) => M[B]): Monad[B]
        }

D.

        trait Monad[M[_]] {
          def map[T] (m: M[S], f: S => M[T]) : Monad[T]
          def flatMap[S, T] (m: M[S], f: S => M[T]) : T
        }

 

  1. Which of the following statements about Spark is false?

    A. Transformations are not executed until an action is called
    B. mapValues is an example of a function that requires shuffling
    C. Allows the programmer to customize the partitioning of pair RDDs
    D. Lazy transformations enable Spark to run more efficiently

 

  1. In a system with 3 nodes A, B, C, six events took place in the following order:

    • Event 1. A sends msg1 to B
    • Event 2. B sends msg2 to A
    • Event 3. B receives msg1
    • Event 4. A receives msg2
    • Event 5. B sends msg3 to C
    • Event 6. C receives msg3

    Which pair of events are not causally related (i.e., they are concurrent) to each other? (You can compute vector clocks for checking causal dependencies.)

    A. (Event 2, Event 4)
    B. (Event 3, Event 4)
    C. (Event 3, Event 5)
    D. (Event 3, Event 6)

 

  1. Which of these statements about replication architectures is false?

    A. With synchronous replication, the write operations cannot be processed if a follower does not respond
    B. Asynchronous replication guarantees that the followers are up-to-date with the leader
    C. Multi-leader replication requires a policy to resolve conflicting write requests
    D. Leaderless replication processes read/write queries based on a quorum of replicas

 

  1. Which of these sentences about the Raft consensus algorithm is false?

    A. Every Raft term has exactly one leader
    B. Raft defines 2 cluster states: “Leader election” and “Log replication”
    C. Raft ensures that only servers with up-to-date logs can become leader
    D. The Raft algorithm works by assuming that the exchanged messages are valid and true

 

  1. What is Byzantine fault tolerance?

    A. Resilience against multiple node crashes
    B. Resilience against multiple message losses
    C. Resilience against partitioned network
    D. Resilience against malicious nodes

 

  1. We’re designing a system that replicates a sequence of user commands on a geographically distributed set of nodes. Our system ensures that all previous commands are replicated to all the nodes before processing a new one.

    Given that our system runs on an unreliable network, which of the properties cannot always be satisfied by our system?

    A. Atomicity
    B. Availability
    C. Consistency
    D. Durability

 

  1. A GFS chunkserver/HDFS datanode is responsible to:

    A. Store filesystem path information
    B. Split the data into partitions
    C. Store data partitions
    D. Replicate the data onto multiple disks

 

  1. Which of the following is false about Pregel and Bulk synchronization parallel (BSP) model?

    A. BSP is an edge-centric approach where the algorithm iterates over the edges
    B. BSP executes in supersteps each of which involves local computation followed by message sending
    C. Supersteps are globally synchronized among all vertices
    D. Pregel is an adaptation of BSP for graph processing

 

  1. An advertisement company wants to monitor the sequence of videos viewed by a registered user in one sitting. What type of streaming windows would be the most suitable to analyze the user behavior?

    A. Sliding window
    B. Tumbling window
    C. Global window
    D. Session window

 

  1. What will be produced by the following Flink code snippet:

    dataStream.map(c => (c.id, 1)).keyBy(x => x._1).sum(1)

    A. Stream of the sums per ID
    B. Stream with a single number (the total sum)
    C. Stream of increasing integers
    D. Runtime Exception

Open-ended questions (3 questions, 7 pts)

  1. (2pts) The figure below depicts an execution history that operates on a distributed integer variable x whose initial value is 0. The operations in the history are labelled with A-G. Is the given execution history linearizable? If yes, provide a valid linearization. If not, provide an example operation that violates linearizability.

  1. (3pts) Given the following Spark program, write the types of the variables a, b, and, c.

      case class Student(sid: Int, sname: String)
      case class Enrollment(sid: Int, cid: Int)
    
      val students: RDD[Student] = // reads students from a file into an RDD 
      val enrollments: RDD[Enrollment] = // reads enrollments from a file into an RDD
    
      val a = students.map(s => (s.sid, s.sname))
      val b = enrollments.groupBy(_.sid).mapValues(_.size)
      val c = a.join(b)
  2. (2pts) What are wide and narrow transformations in Spark? Describe them with 1-2 sentences and provide an example for each type of transformations.

Functional programming questions (3 questions, 9 points)

See the FP practice questions in Weblab.