You are here: Home Student Topics

Student Topics

  • All the topics are offered both as M.Sc. Theses and as M.Sc. Projects.
  • In case you like a topic contact us at  with an email whose title begins as "Application for MSc Thesis - ..." or "Application for MSc Project - ...". In addition, please append your transcript of records to the application email.
  • All the topics demand a deep knowledge of Machine Learning, a good knowledge of Deep Learning and some knowledge of Automated Machine Learning. Therefore, we strongly encourage only students that have passed the Machine Learning and Deep Learning courses to apply.
  • For technical questions regarding details of a topic, please feel free to the approach the contact person indicated at the description of each topic. (Their corresponding information can be found under the People Category in the Navigation Bar to the left)

 

Many-fidelity DPLs 


Recently, it has been shown that algorithm learning curves based on a certain incremental fidelity/resource (model size, number of training instances used) follow a power law formulation [1][2][3] which in turn can help predict final performance given lower level fidelities. Furthermore, it has been shown that incorporating the aforementioned assumption into an ensemble of neural networks, where every neural network is conditioned to following a power-law formulation (DPL) in its response [4] speeds up hyperparameter optimization and yields state-of-the-art in a variety of benchmarks over prior work.

In this topic, we would like to widen our area of investigation by considering more than 1 fidelity, investigating the performance over many fidelities, deciding dynamically on the most promising hyperparameter configurations, and over which dimensionality the training should be pursued.

Contact Person: Arlind Kadra

[1] Jonathan S. Rosenfeld and Amir Rosenfeld and Yonatan Belinkov and Nir Shavit. A Constructive Prediction of the Generalization Error Across Scales. International Conference on Learning Representations 2020.
[2] Rosenfeld, Jonathan S and Frankle, Jonathan and Carbin, Michael and Shavit, Nir. On the Predictability of Pruning Across Scales. Proceedings of the 38th International Conference on Machine Learning 2021.
[3] Behrooz Ghorbani and Orhan Firat and Markus Freitag and Ankur Bapna and Maxim Krikun and Xavier Garcia and Ciprian Chelba and Colin Cherry. Scaling Laws for Neural Machine Translation. International Conference on Learning Representations 2022.
[4] Kadra, Arlind and Janowski, Maciej and Wistuba, Martin and Grabocka, Josif. Deep Power Laws for Hyperparameter Optimization. ArXiv: https://arxiv.org/abs/2302.00441

NAS for RL 


Neural Architecture Search (NAS) is an area that has been intensely researched in the last few years [1, 2, 3]. Work in this area aims to automate the process of defining the neural architecture of deep learning methods. Despite the vast amount of literature, there are still only a few applications of proposed methods for Reinforcement Learning [4].

One-shot NAS methods[5, 6, 7] have shown good performance in optimizing the architectural parameters of deep learning methods for Supervised Learning. Betty[8] has been proposed as a scalable, user-friendly, and modular automatic differentiation library for multilevel optimization for easier usage of these methods.

The student is expected to compare the performance of different one-shot NAS methods for RL using Betty. It is interesting to understand which methods work better for RL, and the amount of adaptation that is needed to apply them for the specific case of RL.

Contact Person: Gresa Shala

[1] Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. "Neural architecture search: A survey."
Journal of Machine Learning Research, 20(55):1–21, 2019.
[2] Martin Wistuba, Ambrish Rawat, and Tejaswini Pedapati. "A survey on neural architecture search."
arXiv preprint arXiv:1905.01392, 2019.
[3] Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen, and Xin Wang.
"A comprehensive survey of neural architecture search: Challenges and solutions."
arXiv preprint arXiv:2006.02903, 2020.
[4] Miao, Yingjie, et al. "Differentiable Architecture Search for Reinforcement Learning."
International Conference on Automated Machine Learning. PMLR, 2022.
[5] Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "Darts: Differentiable architecture search."
arXiv preprint arXiv:1806.09055 (2018).
[6] Xue, Song, et al. "IDARTS: Interactive Differentiable Architecture Search."
Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[7] Chen, Xiangning, et al. "Drnas: Dirichlet neural architecture search."
arXiv preprint arXiv:2006.10355 (2020).
[8] Choe, Sang Keun, et al. "Betty: An automatic differentiation library for multilevel optimization."
arXiv preprint arXiv:2207.02849 (2022).

Meta-learning Backbone Architectures for Tabular Datasets


Meta-learning or learn-to-learn is a problem setting for finding algorithms or architectures for learning efficiently using different datasets (meta-datasets). In this master/project topic, you will explore how to use auxiliary datasets to learn an architecture that can perform well on new tabular datasets. The architecture predicts the label of a new sample or query based on some observations, also known as a support set. The student is expected to compare to previous work with related architectures using DeepSets [1], Transformers [2], or Fully convolutional networks combined with transformers [2]. For meta-learning the architecture, a lot of open datasets are available [4].

Contact Person: Sebastian Pineda

[1] Iwata, Tomoharu, and Atsutoshi Kumagai. "Meta-learning from tasks with heterogeneous attribute spaces." Advances in Neural Information Processing Systems 33 (2020): 6053-6063.
[2] Hollmann, Noah, et al. "TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second." arXiv preprint arXiv:2207.01848 (2022).
[3] Arik, Sercan Ö., and Tomas Pfister. "Tabnet: Attentive interpretable tabular learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 8. 2021.
[4] Bischl, Bernd, et al. "Openml benchmarking suites." arXiv preprint arXiv:1708.03731 (2017).

Meta-Learning Ensembling Strategies


Ensembling is a popular technique that aggregates predictions of base models for improving the performance of classical machine learning models [1] and neural networks [2]. The selection of the base models is very important as they should be accurate but diverse. Common approaches include a greedy construction of the ensemble or using Bayesian Optimization. On the other hand, meta-learning has proven to be effective for BO in HPO, by leveraging information from metrics in auxiliary datasets [3]. In this master thesis/project, the student will face the question: how can we leverage information from ensembling metrics in auxiliary datasets?

Contact Person: Sebastian Pineda


[1] Feurer, Matthias, et al. "Auto-sklearn 2.0: Hands-free automl via meta-learning." arXiv preprint arXiv:2007.04074 (2020).
[2] Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. "Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in neural information processing systems 30 (2017).
[3] Wistuba, Martin, and Josif Grabocka. "Few-shot Bayesian optimization with deep kernel surrogates." arXiv preprint arXiv:2101.07667 (2021).

Attention-based Representation Learning for Neural Architecture Search


Neural Architecture Search (NAS) automates the design process of Neural Networks (NN). While the most prominent discrete encoding schemes prove efficient in commonly researched search spaces [NASBench101, NASBench201, DARTS], they are inapplicable to the optimization of deep topologies present in real-world applications or problems of discovering NN from scratch [1]. Inspired by Yan et al. [2], we decouple architecture representation learning from architecture search to perform robust task-agnostic construction of latent space. Our main contribution is the novel attention-based model (based on [3]) and the pretraining procedure for hyperparameter optimization. Further, we propose a Bayesian Optimization strategy to perform search efficiently, outperforming existing baselines.

Contact Person: Maciej Janowski

[1] Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sanjay Sukthanker, Thomas Brox, Frank Hutter. Towards Discovering Neural Architectures from Scratch https://openreview.net/forum?id=Ok58hMNXIQ&referrer=%5Bthe%20profile%20of%20Simon%20Schrodi%5D(%2Fprofile%3Fid%3D~Simon_Schrodi1)
[2] Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, Mi Zhang. Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? https://arxiv.org/abs/2006.06936
[3] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio. Graph Attention Networks. https://arxiv.org/abs/1710.10903