Showing posts with label education. Show all posts
Showing posts with label education. Show all posts

Wednesday, July 15, 2020

What degree do I need for a career in bioinformatics?

The type of bioinformatics degree you pursue should be influenced by the type of job that you want. Go to indeed.com, and take a look at some of the positions you may be interested in (dream big!). If they all require at least a master's, then you better get serious about graduate school. In general, getting a graduate degree in bioinformatics can make you more competitive for bioinformatics jobs. Having a graduate degree can be a tie-breaker in your favor if you are competing against someone with otherwise identical qualifications for a position, and a graduate degree will qualify you for more positions since many positions require at least master’s if not a PhD.

Bachelor's
A bachelor’s degree is traditionally a four year degree in the United States. It will land you a job that, day to day, resembles a traditional software engineering job. Web interfaces, data visualizations, dashboards, databases, and maybe even some pipetting will be your bread and butter. If your job description mentions research, then you can expect to spend a lot of time coding other people's algorithms. If this is the type of position you want and/or are already qualified for (especially fresh out of college with a computer science degree), then the time that would be otherwise devoted to graduate school might be better spent on internships and getting work experience. Still, a master's will not hurt and might help job prospects (and starting salary) if you are on the fence. 

Master’s degree
A master's degree is a graduate degree that you can complete after receiving a bachelor's degree. Most master's degrees, including ones in bioinformatics, will take you two years to complete. Master's degrees are coursework-based, but some master’s degrees can include a master’s thesis option where you work on a novel project or research project of your own. Graduate certificate programs are abridged master’s programs that do not award you with a master’s upon completion. Certificate programs are generally easier to get into, are cheaper, and may be an option if you are having trouble getting into a traditional master's program. Master's programs are easier to get into compared to a PhD program because students are typically expected to pay their own way for the master's. However, the investment can be worth it since having a master’s degree pays better than an undergraduate degree and bioinformatics is no exception. Completion of a master’s degree will help you land more interesting jobs, gains you more independence, and allows you to work on more advanced projects. A master's in bioinformatics can help you expand your biological knowledge if you have a computational background, and it will certainly help your computational skills if you have a background in a biomedical field. If you are interested in developing novel algorithms, then you really need to get at least a master's and probably consider a PhD. There are programs that offer master's degrees in bioinformatics completely online. This might be an option if you are having trouble getting into a traditional program, but beware that you will lose out on critical in-person networking that can help you get a job later [note: this was written before the current global pandemic; online courses are quickly becoming the norm for at least the short-term].

Doctoral Degree (Doctor of Philosophy aka PhD)
A PhD is a research intensive degree, and a PhD in the computational sciences, like bioinformatics, will take at least four years to complete. It is not unheard of for a PhD to take longer since progress depends on you and your project. The first two years of a PhD generally involve coursework or a combination of coursework and novel research under a faculty advisor/mentor (also referred to as a principal investigator or PI). The remaining years you are expected to be a productive researcher that works on and publishes novel research, helps your PI write grant proposals, and attends scientific conferences to showcase your work. Your pièce de résistance will be your dissertation, a hundred-plus page document that where you compile your research into a coherent story (or several sub-stories with similar themes). In addition to writing your dissertation, you must orally defend it in front of a committee PhD-wielding professors that you and your PI have picked. Your dissertation work often gets turned into one or many peer reviewed publications before or after your graduation (depending on your university and your circumstances). You are normally required to publish one or more first author papers in peer reviewed journals before you are allowed to graduate.

One huge benefit to choosing a PhD over a master's is a teaching assistantship, which essentially lets you go to graduate school for free. A teaching assistantship provides a full time PhD student with a stipend (a salary) in the $20,000+ range with a tuition waiver (reduced or free tuition) in exchange for helping to teach undergraduate courses. This often ends up being in the realm of 20 hours a week of work grading papers, attending office hours, teaching labs, and even lecturing courses. If you play your cards right and are very, very frugal with your stipend, then you can come out of graduate school debt-free. PhD students usually get priority over master's students for teaching assistantships (in some programs, master's students are not eligible for teaching assistantships), so this is something to consider if you are on the fence between a master's and a PhD. Many PhD programs give their students students assistantships by default, but others may require a separate application. PhD students are also eligible for research assistantships. This is along the same idea of a teaching assistantship except instead of being paid to teach, you are getting paid from your PI's grant (or your own grant/fellowship) to do research.

A PhD in bioinformatics can open the most doors for you. Many positions require graduate degrees, and positions that do not will very likely credit your education years towards years of experience. If you are interested in academia, then a PhD is mandatory for trying to become a faculty/professor at a university. If you are looking to one day lead a team of researchers or be a director of a program, then this definitely requires a PhD. If you want to start your own bioinformatics company, then you had better get a PhD so investors will take you seriously. Bioinformatics has A LOT of PhDs in the field, so it could be challenging to get taken seriously as a founder if you are not part of the PhD club. 

What program should I choose?
Your field of study matters but it also does not matter. A graduate bioinformatics degree can cover enough of biological and computational topics to make you a fairly well rounded scientist. However, many of the hard sciences (especially computer science, statistics, and mathematics) can give you the computational foundation needed for a career in bioinformatics. Graduate work in seemingly unrelated areas can have direct applications to bioinformatics. Electrical engineering comes to mind as a field that you might not think of but is leading the way in machine learning and artificial intelligence research. Keep in mind all programs are not created equal. Although a bioinformatics degree might seem like the logical choice for a career in bioinformatics, if you have the opportunity to get a hard science degree from a better program/university, especially if there are groups there doing bioinformatics research, then you should definitely consider this as an alternative. Note that most job postings do not require "x degree in bioinformatics" but rather say "x degree in bioinformatics, computer science, statistics, or related field".

Closing thoughts
I personally recommend at least a master's degree for people who ask me for career advice, mostly due to how broad the field is. However, you can be a very successful bioinformatician regardless of the level of your degree. There are bachelor's and master's holding bioinformaticians out there that can go toe-to-to with a PhD-holding bioinformatician any day, but this very much depends on the person, their education/background, and their drive. 

Cannot afford school or do not have the time to go back? Check out my guide for getting a bioinformatics education online for free.

Monday, January 8, 2018

Get a bioinformatics education online for free

This post contains affiliate links, meaning when you click a link and make a purchase, we receive a commission that helps support this site.

You can now get an entire bioinformatics education online for free. This is thanks to 1) the explosion in Massive Open Online Courses (MOOCs) and 2) academics making more and more of their work (especially books) open access, meaning they're available to anyone at no cost. MOOCs are a great option for students looking to supplement their education to make them more attractive to bioinformatics hiring managers. A biology major can use MOOCs to learn how to program and analyze data, while other STEM majors lacking biology credits can take MOOCs on molecular biology and next generation sequencing technologies. MOOCs are a great option for professionals looking for a career change or promotion since you can learn extra skills on your own time. MOOCs can even be an option for someone looking to get into a science as a hobby (bioinformatics is one of the few fields in the biomedical sciences where you can analyze the genetic code of cancer from home in your underwear). Courses start every few weeks, so there are ample opportunities to get started quickly if you're ready to get started. 

When it comes to online platforms, it's best to stick with ones partnered with or owned by a major university. Two of the most popular platforms are Coursera (Johns Hopkins) and edX (Harvard). Because these platforms are affiliated with top universities in the United States, it is not uncommon to have your course taught by a leading researcher in a particular topic. These platforms offer a range of bioinformatics-related courses from introductory biology and introductory programming to advanced concepts like deep learning and systems biology. Most of these courses have an audit option, which allows you to take them for free. However, the downside to auditing is that you might not be able to access certain course materials (Coursera), you won't be able to submit certain assignments or get grades for your work (Coursera), and you won't receive a certificate proving that you successfully completing the course (both Coursera and edX).

In addition to having standalone courses, Coursera and edX both feature paid specializations (edX calls them "XSeries"). Specializations are series of related courses designed to help you master a specific topic. On Coursera, many specializations build your knowledge on a topic and culminate in a final Capstone Project that you can put straight onto GitHub. If you read my guide on your first bioinformatics project, then you know the importance of having a project to showcase your talents to prospective employers. Completing a specialization earns you a Specialization Certificate, and these certificates should be used to enhance your resume/CV and to show potential employers that you are competent in a particular topic. Not only are they good CV padding, but paid specializations will give you a little skin in the game and make you more accountable to yourself (many who start free MOOCs never finish them).

There are literally thousands of courses on Coursera alone, so it can be hard to parse through them all to find the ones that are worth your time. Here, I give a handpicked list of courses that will give you the tools you need to get quickly up to speed in bioinformatics. This is by no means an exhaustive list since bioinformatics touches on so many different areas. Instead, I focus on core competencies and then suggest optional courses that you can take depending on your interests and the type of job you'd like to get. For my recommendations, I tend to favor specializations because they give you a more cohesive experience instead of feeling like a patchwork of disconnected information. I start off with introductory courses for building your knowledge of both biology and programming. If you are already comfortable with biology or have an undergraduate biology degree, then I would suggest focusing more on the programming and data analysis courses. It's better to know some biology and spend time perfecting your programming skills than it is to be an expert at biology who flounders at simple coding tasks. On the other hand, if you are a STEM major familiar with programming and data analysis, then it is probably worth your time learning biology to understand the context of the bioinformatics problems you will be working on. If you are coming from outside of STEM, then I really recommend going through all of the introductory courses. Getting a solid foundation is the basics is necessary for handling the advanced concepts to come. Next, I list intermediate courses. These are the bread and butter of bioinformatics and include a lot of the type of work you can expect as a bioinformatician. It's fair to say a lot of bioinformatics positions focus on gene expression pipelines and data analysis, so this is reflected in these intermediate course recommendations. Even if you aren't super familiar with more exotic types of data (I work almost exclusively with proteomics and metabolomics data), the education you receive in these intermediate courses will give you the ability to tackle most types of problems you come across. Then, I list advanced topics to further hone your skills. These courses tend to be on the more difficult side, but they are well worth the time investment to hone your skills. Finally, I end with a number of popular, freely available books. These make good companions to the courses or can even serve as good introductory texts if you prefer self-learning.

Introductory Courses

Biology

Essential Human Biology: Cells and Tissues
Perfect for someone with zero biology experience, this course will give you an introduction to the structure and function of human cells and tissues and lay a foundation for more advanced topics.

Introduction to Biology - The Secret of Life
Introductory level molecular biology course hosted by professor Eric Lander, one of the leaders of the Human Genome Project. The course content reflects the topics taught in the MIT introductory biology courses and many biology courses across the world.

DNA: Biology’s Genetic Code
Most bioinformatics projects (probably the vast majority) revolve around next generation sequencing technologies and genomics, so it's really important to get a solid foundation in this area. This course explores the basics of DNA structure, packaging, replication, and manipulation. 

Computer Science and Programming

Fundamentals of Computing Specialization (Option #1)
I list two computer science tracks here depending on your preferences. I love this specialization because of the emphasis on developing critical mathematical problem solving and algorithmic thinking. These skills are the bread and butter of a bioinformatician. 

The courses include:
  • An Introduction to Interactive Programming in Python (Part 1)
  • An Introduction to Interactive Programming in Python (Part 2)
  • Principles of Computing (Part 1) 
  • Principles of Computing (Part 2)
  • Algorithmic Thinking (Part 1)
  • Algorithmic Thinking (Part 2)
  • The Fundamentals of Computing Capstone Exam

Python for Everybody (Option #2)
This track will take you through 'fundamental programming concepts including data structures, networked application program interfaces, and databases, using the Python programming language'. This track places a bit more emphasis on practical application, so it might be a good option if you find yourself struggling with the first track (which is a bit more mathy/technical). 

The courses include:
  • Programming for Everybody (Getting Started with Python)
  • Python Data Structures
  • Using Python to Access Web Data
  • Using Databases with Python
  • Capstone: Retrieving, Processing, and Visualizing Data with Python

Optional

Introduction to the Biology of Cancer
There are a lot of cancer-related bioinformatics jobs in big pharma and at academic centers, so knowing a thing or two about cancer can help you land a job. This optional course introduces the molecular biology of cancer (oncogenes and tumor suppressor genes) as well as the biologic hallmarks of cancer. 

Epigenetic Control of Gene Expression
This optional course introduces you to epigenetics, the study of heritable changes in gene function that do not involve changes in the DNA sequence. This is a more specialized course and might not be applicable to everyone, but you can run into trouble landing a job that involves epigenetics or epigenomics without some background in this area.

Introduction to Computer Science aka CS50x
Learning to program is important, but learning to think like a computer scientist is equally as important. This is a good skill to develop for a bioinformatician since you will be called upon to solve complex problems at the intersection of computer science and biology. CS50x is an immensely popular course taught in person at Harvard that has been adapted for the edX platform. 

Intermediate Courses

Bioinformatics

Bioinformatics Specialization (Option #1)
You've got three really great tracks to pick from for your core bioinformatics competencies. These three tracks cover roughly the same topics, so you should look in to each to see which one piques your interest and is right for you. This first specialization comes from the creators of rosalind.info (a free bioinformatics practice site). The first course in this track, "Finding Hidden Messages in DNA (Bioinformatics I)", is listed as a Top 50 MOOC of All Time, and there are even two print textbooks (highly recommended) that go along with the course: Bioinformatics Algorithms Volume I and Volume II

The courses include:
  • Finding Hidden Messages in DNA (Bioinformatics I)
  • Genome Sequencing (Bioinformatics II)
  • Comparing Genes, Proteins, and Genomes (Bioinformatics III)
  • Molecular Evolution (Bioinformatics IV)
  • Genomic Data Science and Clustering (Bioinformatics V)
  • Finding Mutations in DNA and Proteins (Bioinformatics VI)
  • Bioinformatics Capstone: Big Data in Biology (Bioinformatics VII)

Genomic Data Science Specialization (Option #2)
Taught by renowned data scientist and biostatistician (Jeff Leek, Johns Hopkins ), this specialization will give you the skills you need to understand, analyze, and interpret data from next generation sequencing experiments. Features hands-on exercises with the command line, Python, R, Bioconductor, and Galaxy. 

The courses include:
  • Introduction to Genomic Technologies
  • Genomic Data Science with Galaxy
  • Python for Genomic Data Science
  • Algorithms for DNA Sequencing
  • Command Line Tools for Genomic Data Science
  • Bioconductor for Genomic Data Science
  • Statistics for Genomic Data Science
  • Genomic Data Science Capstone

Data Analysis for Life Sciences and Genomics Data Analysis (Option #3)
This track consists of two complementary XSeries. Both are taught by another renowned data scientist and biostatistician (Rafael Irizarry, Harvard). Data Analysis for Life Sciences '...is perfect for anyone in the life sciences who wants to learn how to analyze data. Problem sets will require coding in the R language to ensure learners fully grasp and master key concepts.' The second part of the track, Genomics Data Analysis, '...is an advanced series that will enable students to analyze and interpret data generated by modern genomics technology... is perfect for those who seek advanced training in high-throughput technology data.'

The courses include:
  • Statistics and R
  • Introduction to Linear Models and Matrix Algebra
  • Statistical Inference and Modeling for High-throughput Experiments
  • High-Dimensional Data Analysis
  • Introduction to Bioconductor: Annotation and Analysis of Genomes and Genomic Assays
  • High-performance Computing for Reproducible Genomics
  • Case Studies in Functional Genomics

Other Intermediate Courses

Data Structures and Algorithms Specialization
Knowing the right algorithm to use can mean the difference between a job that takes an hour or a week to run. This specialization has a combination of theory and practice, and you will implement nearly 100 algorithmic coding problems in your language of choice (instead of just taking a multiple choice quiz like many MOOCs). This specialization also features two big, real-world projects: Big Networks and Genome Assembly. The first involves analyzing different networks (e.g roads, social) and finding the shortest path. The second will cover assembly algorithms and assembling genomes from millions of short fragments of DNA.
  
The courses include:
  • Algorithmic Toolbox
  • Data Structures
  • Algorithms on Graphs
  • Algorithms on Strings
  • Advanced Algorithms and Complexity
  • Genome Assembly Programming Challenge

Optional

Big Data Specialization
As a bioinformatician, working with tens of thousands of sequenced genomes or millions of radiology images (e.g. CAT scans) means you'll need to know a thing or two about big data. This specialization will give you hands-on experience with the tools and systems you need. These courses take you through the basics of using Hadoop with MapReduce, Spark, Pig and Hive, and you will be shown how to ask the right questions about data, how to communicate like a data scientist, and how to perform exploration of large, complex datasets. 

The courses include:
  • Introduction to Big Data
  • Big Data Modeling and Management Systems
  • Big Data Integration and Processing
  • Machine Learning With Big Data
  • Graph Analytics for Big Data
  • Big Data - Capstone Project

Mathematical Biostatistics Boot Camp 1 & Mathematical Biostatistics Boot Camp 2
These courses will help if you are struggling with some of the statistical concepts you encounter in your training as a bioinformatician. They introduce 1) the fundamental probability and statistical concepts used in elementary data analysis and 2) fundamental concepts in data analysis and statistical inference. These courses are appropriate for '...undergraduate students with junior or senior college-level mathematical training including a working knowledge of calculus. A small amount of linear algebra and programming are useful for the class, but not required.'


Advanced Courses

Machine Learning and Deep Learning

Machine Learning (Option # 1)
This first option is actually a standalone course (in lieu of a specialization) because I like it so much. It is taught by one of the world's foremost experts in machine learning (Andrew Ng, Baidu Research/Stanford University). The fact that this course is available for anyone to take is mind blowing. I highly recommend it. You will be given an introduction to machine learning, data mining, and statistical pattern recognition. You'll learn both supervised and unsupervised methods as well as best practices in machine learning. The course features numerous case studies and applications, so you'll be getting a taste of everything from '...building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.'

Machine Learning Specialization (Option # 2)
If you want to spend a bit more time getting familiar with machine learning, then this four course specialization in machine learning from the University of Washington is for you. 'Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data.'

The courses include:
  • Machine Learning Foundations: A Case Study Approach
  • Machine Learning: Regression
  • Machine Learning: Classification
  • Machine Learning: Clustering & Retrieval

Deep Learning Specialization
An entire Deep Learning specialization by Andrew Ng? Yes please! This specialization will teach you the foundations of deep learning, how to build neural networks, and how to run a successful deep learning project. They cover everything from convolutional networks and recurrent neural networks (RNN) to long short term memory (LSTM), Adam, Dropout, BatchNorm, Xavier/He initialization, and more. These courses teach you both the theory and how deep learning is applied in industry.  This includes several case studies in healthcare, autonomous driving, sign language reading, music generation, and natural language processing. This course is taught in Python and in TensorFlow.

The courses include:
  • Neural Networks and Deep Learning
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
  • Structuring Machine Learning Projects
  • Convolutional Neural Networks
  • Sequence Models

Other Advanced Courses

Systems Biology and Biotechnology Specialization
Many bioinformaticians work in systems biology, '...[a] field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.' This broad field utilizes a whole slew of different methodologies, so this specialization introduces you to topics like dynamical modeling, network and statistical modeling, "omics" technologies (e.g. genomics, proteomics), and single cell research technologies. Upon completion, you'll know how to combine experimental, computational, and mathematical methods to answer questions in a variety of biomedical fields. 

The courses include:
  • Introduction to Systems Biology
  • Experimental Methods in Systems Biology
  • Network Analysis in Systems Biology
  • Dynamical Modeling Methods for Systems Biology
  • Integrated Analysis in Systems Biology
  • Systems Biology and Biotechnology Capstone

Recommended Free Books

Programming

Advanced R - Free Online Version - Buy Print Version 
R programming book by data scientist Hadley Wickham (credentials include Chief Scientist at RStudio, and an Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University)

Think Python - Free Online Version (1st Edition) - Buy Print Version (2nd Edition) 
'Think Python is an introduction to Python programming for beginners. It starts with basic concepts of programming, and is carefully designed to define all terms when they are first used and to develop each new concept in a logical progression. Larger pieces, like recursion and object-oriented programming are divided into a sequence of smaller steps and introduced over the course of several chapters.'

Statistics and Data Science

An Introduction to Statistical Learning with Applications in R - Free Online Version - Buy Print Version 
'This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.'

Think Stats - Free Online Version - Buy Print Version 
'Think Stats is an introduction to Probability and Statistics for Python programmers. Think Stats emphasizes simple techniques you can use to explore real data sets and answer interesting questions. The book presents a case study using data from the National Institutes of Health. Readers are encouraged to work on a project with real datasets. If you have basic skills in Python, you can use them to learn concepts in probability and statistics. Think Stats is based on a Python library for probability distributions (PMFs and CDFs). Many of the exercises use short programs to run experiments and help readers develop understanding.'

Exploratory Data Analysis with R - Free Online Version - Buy Print Version 
'This book teaches you to use R to effectively visualize and explore complex datasets. Exploratory data analysis is a key part of the data science process because it allows you to sharpen your question and refine your modeling strategies.'

The Elements of Statistical Learning (2nd edition) - Free Online Version - Buy Print Version 
'During the past decade has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting--the first comprehensive treatment of this topic in any book.'

Machine Learning

Understanding Machine Learning - Free Online Version  - Buy Print Version 
'Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides a theoretical account of the fundamentals underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics, the book covers a wide array of central topics unaddressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for advanced undergraduates or beginning graduates, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics and engineering.'

Biology

Molecular Biology of the Cell - Searchable Online Version (4th Edition) - Buy Print Version (6th Edition)
'Molecular Biology of the Cell is the classic in-depth text reference in cell biology. By extracting fundamental concepts and meaning from this enormous and ever-growing field, the authors tell the story of cell biology, and create a coherent framework through which non-expert readers may approach the subject. Written in clear and concise language, and illustrated with original drawings, the book is enjoyable to read, and provides a sense of the excitement of modern biology. Molecular Biology of the Cell not only sets forth the current understanding of cell biology (updated as of Fall 2001), but also explores the intriguing implications and possibilities of that which remains unknown.'

Molecular Cell Biology - Searchable Online Version (4th Edition)  - Buy Print Version (8th Edition)
'Modern biology is rooted in an understanding of the molecules within cells and of the interactions between cells that allow construction of multicellular organisms. The more we learn about the structure, function, and development of different organisms, the more we recognize that all life processes exhibit remarkable similarities. Molecular Cell Biology concentrates on the macromolecules and reactions studied by biochemists, the processes described by cell biologists, and the gene control pathways identified by molecular biologists and geneticists. In this millennium, two gathering forces will reshape molecular cell biology: genomics, the complete DNA sequence of many organisms, and proteomics, a knowledge of all the possible shapes and functions that proteins employ.'

Misc.

How to be a modern scientist - Free Online Version 
'A book about how to be a scientist the modern, open-source way.'

Popular Posts