Find my Portfolio & Download my CV

Welcome to the internet home of Steffen Knödler. Feel free to look around,
check out my Github, my Potfolio, download my CV here
or just get in touch.

Portfolio

This is an overview of my knowledge and a brief summary of relevant projects which I completed for academic, self-learning, and hobby purposes. If you like what you see and you want to have a chat with me about my portfolio, work opportunities, or collaboration write me an email.

1 Projects

Software Applications

  • Flutter Apps (Repo) : Mobile Apps include a Quiz App, a Locations Apps app and other small independent app Flutter Components
    Dart - Flutter (Google APIs)

  • Serverless App to automate my Job Search (Repo) : In this short article (https://lnkd.in/d4BBqWE) I explain how you can build your own serverless application to automate your job search. The application crawls data from the job platform stepstone, extracts job titles and company names. I included an option to filter search results: Eliminate search results that include certain words in the job title, such as "Senior" positions. Filter company names and/ or specify words that must be included in the job title. The Application is scheduled with CloudWatch and stores data in a table in DynamoDB.
    Python, Serverless Framework (Node.js), AWS Lambda, CloudWatch, DynamoDB

  • Interactive Analytic App (Repo): I used Dash to build an interactive analtic app. Dash is an open-source framework for building web-based analytic applications. Built on top of Plotly.js, React and Flask, Dash ties modern UI elements like dropdowns, sliders, and graphs directly to your analytical Python code.
    Python, Dash by Plotly framework

  • Web Apps (repo): I built not only my personal website (accessible via https://sknoedler.github.io),but also multiple other small web apps to search for jobs, search forlegal documents, post a contact requests and much more I have developed a number of small web apps from scratch. it includes apps to search for jobs, display my portfolio, post a contact request and much more. Web apps, including databases, got deployed on Heroku, AWS, IBM Cloud or Github. accessible via https://sknoedler.github.io.
    AWS, Heroku, Django, Flash, Vue.JS, PostgreSQL – Python, JavaScript, HTML, CSS (Bootstrap)]

Machine Learning

  • Injury Prevention Project (html): Injury accidents happen every day in New York State. In this project, I used data from vehicles department of New York State to predict whether a person gets injured if a car crash happens a certain time and particular location. Model 1: Logistic Regression, Model 2: Random Forest, Model 3: Gradient Boosting. The Logistic Regression Model outperforms Random Forest and Gradient Boosting Trees, having AUC of 0.742 with all features. On top of that I developed a demo that could be used from officials to assess the danger of a certain area.
    PySpark (MLSpark), Python (Pandas, Numpy)

Analysis and Visualization

  • Analysis of Airbnb data (pdf): Project paper that describes the analysis of an Airbnb dataset of over 41,000 tuples and 95 attributes. In this paper, I am applying the concept of Cross-industry standard process for data mining (CRISP-DM) when analyzing the dataset..
    Python (Numpy, Pandas, Matplotlib, SKlearn), Tableau

  • Analysis of Students Performance per Class (html): Analyze and visualize data of 5 different school which have implemented the same math course this semester, with 35 lessons.
    R(dplyr, ggplot2, tidyr, ggubr, reshape2)

  • Analysis of Smoking data (html): Does smoking affect your lung capacity? It is well known that smoking is not good for your health, but how can we quantify this in a statistical way? The exercise covers Mean, Hypothesis Testing, Correlation & Histograms
    Python(Numpy, Pandas and Matplotlib)


2 Knowledge

2.1 API

2.2 Git

  • Git (html): Explains how Git works and how to use its commands

2.3 NoSQL

2.4 Cloud Computing

  • Docker & Kubernetes Basics for Developers (pdf): Walk through the fundamentals of Docker and Kubernetes. Including using different Docker Containers, Docker Swarm Introduction, Docker in VM of AWS - Kubernetes Cluster via Minikube, Create and Manage a Deployment via Kubectl, Scale and update the app.

  • Deploying a Cloud Foundry App using Toolchains in IBM Cloud (pdf): Build-Test-Deploy: Cloud Foundry is an open-source PaaS that can be used to deploy as well as scale applications without managing any servers. It takes advantage of a container-based architecture that can run applications in any language. The PaaS includes a self-service application-engine to run the application, an automation engine that makes deployment and maintenance possible, a command line to interact with the environment as well as integration of development tools. Toolchains includes services for continuous delivery, source control, issue tracking as well as online editing.

  • Google Cloud Platform Basics (pdf): Includes Creation and Management of VM Instances, Instance Templates as well as groups of Instances. Describes set up of Virtual Private Cloud (VPC)

  • Hosting a Webpage in AWS Cloud (html): Amazon provides a variety of web services, one of them is the Amazon Elastic Compute Cloud (EC2). EC2 provides scalable compute capacity in the cloud. Therefore, it balances computing power, memory and networking resources. EC2 allows users to configure a Virtual Machine (VM)/ instance to deploy application content. In this task, I show and explain how I hosted a webpage from GitHub to an instance in EC2.

  • Virtualization with VMware VSphere (html)(pdf): vSphere is a server virtualization platform from VMware. It serves as a complete platform for implementing and managing virtual machine (VM) infrastructure on a large scale.

2.5 Data Science Learning

  • Object recognition with deep learning (html): In this task I use the multilayer perceptron as a learning model. The idea is to take the raw pixels of an image and predict the category of images. This is therefore a classification problem. I use the Spark ML library to apply a multilayer perceptron for classification.

  • Feature Engineerring (html): Describes basic preprocessing steps in PySpark that are typically used in feature engineering to improve model accuracy, including: renaming, dropping, bucketize, standarize, ...

  • Data Set Types and Preparation (html): Data set types are the key foundation for any data analysis. This section introduces to the basics of record and non-record data types and provides information about data preparation.

  • Deep Probabilistic Programming (url): One of our Data Science professors Thomas Hamelryck introduced us to Deep Probabilistic Programming (Deep PP) during a lecture in the second semester of my studies. This short blog post on LinkedIn introduces the idea of Deep PP and explains what it is all about.

  • R Programming Language (pdf): Basic R syntax; Foundational R programming concepts such as data types, vectors arithmetic, and indexing; operations in R including sorting, data wrangling using dplyr, and making plots.

  • R Visualization (pdf): Includes relevant information about data visualization and code in R.


3 Completed Academic Courses:

Foundations of Development of IT [SQL, Python] This course covers and deals with each phase in the IT development cycle individually. The cycle will be boken down into themes: initiation, system concept development, planning, requirements analysis, design, code-based development, integration and test, implementation, operation and maintenance, and termination.
Cloud Management [Docker, AWS, IBM-, Google- Cloud, VMware] Cloud services creation and management. Practical experience in using, creating and managing digital services across data centers and hybid clouds. Strategic choices for cloud digital service solutions across open data centers and software defined networks.
Big Data Analytics [Python, Apache Spark, Hadoop 2.0, Spark ML, Pandas, Matplotlib)] Students will learn to obtain, screen, clean, link, manipulate, analyze and display data while creating summaries, overviews, models, analyses and basic tables, histograms, trees and scattergrams. They will use Python and Apache Spark to explore classic and modern machine learning techniques (such as deep learning) within a big data context, including sentiment analysis via supervised learning, recommendation systems via unsupervised learning and predicting credit scoring via random forest machine learning.
Data Analytics [R] General overview in data analytics techniques, familiarity with particular real-world applications, challenges involved in applications, and future directions of the field. Optional hands-on experience with available software packages.
Statistics (15 ECTS) Descriptive methods of univariate data analysis; additional methods and correlation analysis; probability calculus; stochastic variables and distribution, distribution models; sums and means of sampling variables; parameter estimation; confidence intervals; statistical tests; further specific test problems; linear regression model
Mathematics (10 ECTS) The main focus areas of this course are linear algeba {including, amongst others, matrix calculus, matrix inverse, determinants of matrices, linear systems of equations, vector calculus), sequences and series as well as differential calculus {including, amongst others, differentiation of real functions, Taylor expansions, univariate and multivariate optimization of functions without and with constraint {Lagrange method).
Databases (5 ECTS) [SQL] This course offers an in-depth discussion of modern database system architectures and query language for use in databases. The focus lies on the relational databases model and relational query languages (SQL). Other topics covered are data integrity, integrity constraints, and database design.
Introduction to Data Science (7.5 ECTS) [Python] The course covers the following tentative topic list: Foundations of statistical learning, probability theory; Classification methods, such as: Linear models, K-Nearest Neighbor; Regression methods, such as: Linear regression; Bayesian Statistics; Clustering. Dimensionality reduction and visualization techniques such as principal component analysis (PCA).
Navigating Complexity: Mapping, Visualisation & Decision- making (15 ECTS) [Tableau, Gephi, WebCrawler] The course will teach students to describe and analyse complexity within an empirical case. Students will be introduced to a range of conceptual and technical tools for generating and visualizing data and analyzing complexity. Throughout the course students will experiment with different techniques for generating data and visualizing complexity. Based on case work, students will be requested to reflect on how visualizations work as simplifications and can inform decision-making.
Big Data Processes (7.5 ECTS) [Tableau, Python] This course covers analytics and visualization (e.g., exploratory data analysis, classification, clustering), as well as challenges of Big Data processes (e.g., handling of personal data). Furthermore, students will practice communicating and presenting of results as well as reflections during the exercises. Students learn to apply a number of software tools for analytics and visualization.
Marketing Analytics (6 ECTS) The primary goal of this course is the learning of quantitative analytical methods and concepts that lead to the improvement of marketing decisions. In the lectures accompanying exercises and mentors, students gain the competence of independent application of analytical methods and concepts. In addition, the practical relevance of the learned methods and concepts is demonstrated by numerous case studies and practical lectures.
Introduction to Information Management (5 ECTS) [Python] Application systems and information systems as well as business processes and their support by ERP systems are covered. In addition, the lectures address basic knowledge related to data management as well as the concept of data modeling. Subsequently, an introduction programming is given, utilizing the programming language Python.
Business Information Systems (6 ECTS) [SQL] This course covers fundamentals, development, and introduction of information and communication systems for enterprises. It includes functionality and architecture and development of ICS as well as Business Process Reeingineering (BPR).

4 Completed Online Courses:

  • Harvard Data Science: Visualization

    • Harvard Online Course: Basic data visualization principles and how to apply them using ggplot2.
  • Harvard Data Science: R Basics

    • Harvard Online Course: Foundation in R and learn how to wrangle, analyze, and visualize data.
  • Introduction to Python

    • Datacamp.com: In our Intro to Python class, you will learn about powerful ways to store and manipulate data as well as cool data science tools to start your own analyses.
  • Intro to SQL for Data Science

    • Datacamp.com: This course teaches you everything you need to know to begin working with databases today
  • Python Programming

    • Codeacademy.com: This course teaches you everything you need to know to work with the programming language Python
  • Python Bootcamp

    • Udemy.com: Python Bootcamp: Vom Anfänger zum Profi, inkl. Data Science

Send me an Email!

Address


60389, Frankfurt am Main

Phone


on request