This page contains several research ideas that students can pursue under my supervision. They can be adapted for a BSc part-time project, or for a full MSc dissertation. All ideas should result in (at least) one publication.
Per-Class-Per-Instance Cost-Aware Classifiers
Existing Machine Learning techniques are oblivious to the cost of choosing a wrong class for a particular instance. Instead, a cost-matrix is used that assumes that all instances have a static cost when misclassified with the wrong class. But for the purposes of finding a suitable transporter, misclassifying a small dog as a cat is better than misclassifying a large dog as a cat. The goal is to create a Evolutionary approach for multi-class classification that considers costs per-instance and class.
Self-driving Monster Truck
I have bough a MotoTC monster truck that includes apps for android and iPhone remote control. The plan is to reverse engineer the bluetooth protocol, in order to build a self driving app that uses the phone camera to get input and use neural networks to drive the car for different tasks.
Hypertrophic Cardiomyopathy Impact Variant Prediction
This is a collaboration with the Instituto de Medicina Molecular, in which from a dataset of different variants, we want to predict which ones have an higher probability of being pathogenic. For this, we can rely on other tools that perform machine-learning classification using different features.
Programming Languages and Parallel Programming
Quick-checking Dependent Typed Programs
Type checking dependent typed programs is only feasible if the dependent annotations follow a limited language (DML, LiquidHaskell) . For full dependent typed programs, that task is much harder and has been proved not to terminate. In this work, we propose property based testing, à lá QuickCheck, for evaluating whether expressions have the right type. This work consists on building a prototype compiler for expressing programs with Dependent Types, and quick-checking them.
app : Vect n a -> Vect m a -> Vect (n + m) a app Nil ys = ys app (x :: xs) ys = x :: app xs ys
Extracting Parallelism from Dependent-typed Programs
Polly and similar tools extract parallelism from for-loops in C programs. Extracting loop-based parallelism from higher-level languages is harder. In this project, students will use Refinement Types information about lists and other data structures, to verify if different iterations of the loop can be safely executed in parallel.
Using Dependent Types to Improve Genetic Programming
Strongly Typed Genetic Programming has been shown to outperform Standard GP. I believe that Dependent Typed Genetic Programming can have better performance, specially in Program Synthesis. This approach should be tested in regular GP benchmarks, as well as program synthesis benchmarks.
Using Dependent Types in Program Synthesis
Based on the previous idea, Dependent Types can be used to improve the usage of genetic programming in program synthesis. Furthremore, Dependent Types can be used to specify the intended behavior of programs, from which random inputs can be generated, connecting this idea to the Quick-checking Dependent Typed programs. The combination of both techniques can be used to create an Automatic Programming environment.
Similar to X10, Chapel or Aeminium, Par is a language with parallel by default semantics. It targets multicore CPUs and GPUs, and provides a higher level abstraction than current approaches (OpenMP, OpenCL or Cuda). The goal is to apply compiler-time granularity techniques to produce efficient programs, even in the presence of irregular data.
A Cooperative Time Cost Model Type System
Cost-Models are necessary for efficient parallelization of programs. The cost-model of an AST expression can be obtained by composing the cost-model of children nodes. However, in function invocations, it is necessary to infer the cost-model of function invocations. This is very similar to how bidirectional type systems work. This work consists on designing a secondary type system for a programming language that expresses the complexity of functions. This information is then used for evaluating whether a function call will execute asynchronously or not in the context of automatic parallelization.
An Automatic GPU-CPU Load Balancer for Julia
Julia is a high-performant high-level programming language. Think python with the speed of C (thanks to the LLVM JIT compiler). Julia has support for GPU computing but requires explicit identification of GPU operations. This idea intends to create a back-end agnostic API that schedules to the GPU or CPU based on the computation features. The approach should be similar to AeminiumGPU.
A Unified OS-level Task Manager
Nowadays most computers are built on multicore processors. Multicore processors scale from high-end servers with 32 hardware threads to pocket-size quad-core smartphones. Thus, programmers must write parallel programs to speed up their programs.
If we look at production systems, several applications already support parallelism. Some examples are web-servers (Apache, Nginx, ...) and databases (Oracle, Postgres, ...). These applications distribute their work across all threads of the machine, to improve performance. The problem is that they consider as available targets for threads all the processors on the machine, regardless of other applications running on the same machine. For instance: On a quad-core machine running both Nginx and Postgres, both applications take 4 threads, when the best would be for each one to occupy only 2, avoiding context-switching between threads.
This idea consists on developing a kernel-level task scheduler and an API for informing each program of the available parallel processors.
Evolutionary Turtles All The Way Down: an Automatic Configurable Parallel Genetic Algorithm Framework
Evolutionary Algorithms have several operators that can be used in many different configurations. Finding the best configuration is hard. One way to find it is to use another Evolutionary Algorithm (Meta-Heuristic).
Evaluating the performance of a evolutionary algorithm can take time, so it should be parallelizable. However, there are several configurations for parallelization. Another Evolutionary Algorithm can be used to find the best.
This is a joint project with Nuno Lourenço.
A GPU-backed Triple Store for Efficient SPARQL Queries
This idea consists on using GPU processing to improve the performance of a triple store database. GPUs have been successful for high performance databases and I believe that for triple stores it can perform even better.
MISO: a Cell-Based Programming Language for Parallel Dependable Programs
MISO is a programming language based on the cell model (similarly to the Actor model). MISO programs can be compiled with a certain tradeoff of dependability and efficiency. Right now MISO has been implemented as Scala and Rust DSLs. However, these DSLs do not consider the MISO-specific constraints, which are necessary for a correct compilation.