Please read the following information carefully.

Multiple choice questions (16 questions, 16 pts)

  1. The command ls -l lists the file runTests.py as follows:

    -rwxr-xr-x 1 alice alice 1728 Nov 21 10:32 runTests.py

    Which of the following is not correct for the runTests.py?

  1. Which of the pipelines lists duplicate line (which are repeated more than once) in the file file.txt together with the number of their occurrences? We assume that a line cannot be repeated by ten or more times.

    Reminder on the options of uniq command:
    -c prefixes lines by the number of occurrences
    -d only prints duplicate lines, one for each group

  1. What do the following lines evaluate?

    val list = List(1,2,3,4)
    val unknown = list.foldRight(Nil:List[Int])(_ :: _)
  1. Choose the correct implementation of the Monad interface in Scala:
      trait Monad[M[_]] {
          def unit[S](a: S) : M[S]
          def flatMap[S, T] (m: M[S], f: S => M[T]) : Monad[T]
      }
        trait Monad[M[_]] {
          def unit[S](a: S) : M[S]
          def map[T] (m: M[S], f: S => M[T]) : Monad[T]
        }
        trait Monad[M[_]] {
          def map[T] (m: M[S], f: S => M[T]) : Monad[T]
          def reduce[T,B](init: B, f: (B,T) => M[B]): Monad[B]
        }
        trait Monad[M[_]] {
          def map[T] (m: M[S], f: S => M[T]) : Monad[T]
          def flatMap[S, T] (m: M[S], f: S => M[T]) : T
        }
  1. Which of the following is false about fault-tolerance of Spark?
  1. Which Spark RDD function does the following signature correspond to? \[ RDD[A].f(z: B, op1: (B, A) \rightarrow B, op2: (B, B) \rightarrow B ) \rightarrow B \]
  1. Which of the following statements about Spark is false?
  1. In a system with 3 nodes A, B, C, six events took place in the following order:

    • Event 1. A sends msg1 to B
    • Event 2. B sends msg2 to A
    • Event 3. B receives msg1
    • Event 4. A receives msg2
    • Event 5. B sends msg3 to C
    • Event 6. C receives msg3

    Which pair of events are not causally related (i.e., they are concurrent) to each other? (You can compute vector clocks for checking causal dependencies.)

  1. Which of these statements about replication architectures is false?
  1. Which of these sentences about the Raft consensus algorithm is false?
  1. What is Byzantine fault tolerance?
  1. We’re designing a system that replicates a sequence of user commands on a geographically distributed set of nodes. Our system ensures that all previous commands are replicated to all the nodes before processing a new one.

    Given that our system runs on an unreliable network, which of the properties cannot always be satisfied by our system?

  1. A GFS chunkserver/HDFS datanode is responsible to:
  1. Which of the following is false about Pregel and Bulk synchronization parallel (BSP) model?
  1. An advertisement company wants to monitor the sequence of videos viewed by a registered user in one sitting. What type of streaming windows would be the most suitable to analyze the user behavior?
  1. What will be produced by the following Flink code snippet:

    dataStream.map(c => (c.id, 1)).keyBy(x => x._1).sum(1)
    • Stream of the sums per ID
    • Stream with a single number (the total sum)
    • Stream of increasing integers
    • Runtime Exception

Open-ended questions (3 questions, 7 pts)

  1. (2pts) The figure below depicts an execution history that operates on a distributed integer variable x whose initial value is 0. The operations in the history are labelled with A-G. Is the given execution history linearizable? If yes, provide a valid linearization. If not, provide an example operation that violates linearizability.

Answer:

It is not linearizable.

The operation E cannot read x=1 after the operation G which takes effect before E and reads x=2. 

It would be linearizable if G reads x=1 or E reads x=2
  1. (3pts) Given the following Spark program, write the types of the variables a, b, and c.

      case class Student(sid: Int, sname: String)
      case class Enrollment(sid: Int, cid: Int)
    
      val students: RDD[Student] = // reads students from a file into an RDD 
      val enrollments: RDD[Enrollment] = // reads enrollments from a file into an RDD
    
      val a = students.map(s => (s.sid, s.sname))
      val b = enrollments.groupBy(_.sid).mapValues(_.size)
      val c = a.join(b)

Answer:

val a: RDD[(Int, String)]

val b: RDD[(Int, Int)]

val c: RDD[(Int, (Student, Int))]
  1. (2pts) What are wide and narrow transformations in Spark? Describe them with 1-2 sentences and provide an example for each type of transformations.

Answer:

Wide transformations are transformations that involve a shuffle of the data between the partitions.

Example: `groupByKey()`, `join()`

Narrow transformations are transformations for which each input partition will contribute to only one output partition and they do not require shuffle between partitions.

Example: `map()`, `filter()`

Functional programming questions (3 questions, 9 points)

See the FP practice questions in Weblab.