Get The Most Of Jupyter Notebook As A Data Scientist

Posted on March 17, 2018

Python, Data science, Jupyter notebook, Ipython, Tips, Cheat sheat - by Tshilidzi Mudau

Jupyter notebook is a popular tool amongst data scientist1. Many blog posts have been written about the benefits of using Jupyter notebook as a data scientist for this very reason, I won't spend time explaining the benefits of using Jupyter notebooks as a data scientist. Instead, in this post I will focus on how as a data scientist you can make better use of Jupyter notebook. The reason that motivated me to write this post is that what I have seen is that many Jupyter notebook users, are not using it to its full potential, as a result they are missing out on most of its capabilities and benefits. It is to such users that this blog post is aimed at.

Keyboard Input Modes

Jupyter, just like Vim (Cite vim) has different keyboard input modes. Depending on the mode you are in, you can only perform certain operations/actions.

Edit Mode

The Edit mode allows you to type code/text into a cell and is indicated by a green cell border. To go into Edit mode, press Enter. The edit mode pretty much has nothing cool other than that it is a mode in which you can type text into cells (unfortunately, for most users, this is where their usage of jupyter notebook ends). Below we present some of the most useful actions that are useful when in Edit Mode:

  1. Tab completion: - press Tab after typing the few begining characters of the function/command or module you want to type. Eg I want to call the print function, so I type pri and press Tab and Jupyter notebook will fill complete the word for you. This doesn't work only for built-in python function and modules but for pretty much any valid command in Jupyter(including the stuff you defined/wrote yourself).

  2. Indentation: - Use Ctrl + ] and Ctrl + [ for right indentation and left indentaion respectively instead of using your mouse or moving the cursor to the begining of the line and then pressing tab or repeated Space key presses.

  3. Indentation: - Use Ctrl + ] for right indentation instead of using your mouse or moving the cursor to the begining of the line and then pressing tab or repeated Space key presses. Similarly, use

(This is by no means a complete discussion of the things one has to do know about Jupyter notebook. If one wants more information, I would suggest you consult its docs which can be found here..)

Acknowledgments

Thank you to Yoshua Bengio, Michael Nielsen, Dario Amodei, Eliana Lorch, Jacob Steinhardt, and Tamsyn Waterhouse for their comments and encouragement.


  1. Cite Jupyter Notebook.

  2. These representations, hopefully, make the data “nicer” for the network to classify. There has been a lot of work exploring representations recently. Perhaps the most fascinating has been in Natural Language Processing: the representations we learn of words, called word embeddings, have interesting properties. See Mikolov et al. (2013), Turian et al. (2010), and, Richard Socher’s work. To give you a quick flavor, there is a very nice visualization associated with the Turian paper.

  3. A lot of the natural transformations you might want to perform on an image, like translating or scaling an object in it, or changing the lighting, would form continuous curves in image space if you performed them continuously.

  4. Carlsson et al. found that local patches of images form a klein bottle.

  5. \(GL_n(\mathbb{R})\) is the set of invertible \(n \times n\) matrices on the reals, formally called the general linear group of degree \(n\).

  6. This result is mentioned in Wikipedia’s subsection on Isotopy versions.

  7. See Szegedy et al., where they are able to modify data samples and find slight modifications that cause some of the best image classification neural networks to misclasify the data. It’s quite troubling.

  8. Contractive penalties were introduced in contractive autoencoders. See Rifai et al. (2011).

  9. I used a slightly less elegant, but roughly equivalent algorithm because it was more practical to implement in Theano: feedforward two different batches at the same time, and classify them based on each other.