Programming

![wondermark](http://wondermark.com/comics/190.gif) # Programming > [wikipedia:](http://en.wikipedia.org/wiki/Programming_language) A programming language is a formal language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely. The earliest programming languages preceded the invention of the computer, and were used to direct the behavior of machines such as Jacquard looms and [player pianos](http://en.wikipedia.org/wiki/Player_piano). Thousands of different programming languages have been created, mainly in the computer field, and still many are being created every year. Most programming languages describe computation in an imperative style, i.e., as a sequence of commands, although some languages, such as those that support functional programming or logic programming, use alternative forms of description. When using a natural language to communicate with other people, human authors and speakers can be ambiguous and make small errors, and still expect their intent to be understood. However, figuratively speaking, computers "do exactly what they are told to do", and cannot "understand" what code the programmer intended to write. The combination of the language definition, a program, and the program's inputs must fully specify the external behavior that occurs when the program is executed. To make this easier, programming languages and implementations may provide many tools of specification and abstraction, and many libraries of re-usable routines and capabilities. ## Semiotics Programming languages and implementations can be understood in terms of semiotics: the syntax, semantics and pragmatics. ### Syntax **The relations among signs in formal structures.** Syntax specifies all possible (valid) combinations of the surface form. Usually textual, but can also be graphical. May be described using a grammar. A syntax checker verifies the input text matches the syntax rules, or may indicate an error otherwise. A syntactically correct piece of code is not necessarily semantically correct, just as a syntactically correct English sentence does not necessarily have a logical meaning ([E.g. Chomsky's "Colourless Green Ideas Sleep Furiously"](http://en.wikipedia.org/wiki/Colorless_green_ideas_sleep_furiously)) > The study of grammar, the rules of correct structure of language, can be traced back two and half thousand years in ancient India, where the belief had been held that the correct structure and intonation of words had power. ### Semantics **The relation between signs and meaning (the things to which they refer).** Semantics maps from program language syntax to program behavior. Some semantic constraints can be analyzed from the source code, others may only be detectable once the program runs. Industry languages tend to include more features aimed to detect semantic errors at compile-time, such as strict type systems. ### Pragmatics **The relation between signs and sign-using agents (e.g. programmers!).** In linguistics, pragmatics describes the relation between signs and the agent using them, and the way context relates to meaning. In computing, this may encompass the hardware (or virtual machine implementation) that a program runs on, the available libraries and resources, as well as real-world interactions and performance. The standard library and run-time system provided by a language implementation can be more important than the language itself. ## Programming languages There are thousands of programming languages in use today. From a purely abstract, mathematical point of view, most of them have equivalent computing power (delimited by the [universal Turing machine](http://en.wikipedia.org/wiki/Turing-computable_function#Models_equivalent_to_the_Turing_machine_model)). Practically however the choice of language/implementation to use may depend on factors of performance, familiarity, the quality of documentation and supporting libraries, ease of distribution, and so on. ### Try out different languages in the browser - [Codepad.org](http://codepad.org/) - [Repl.it](http://www.repl.it/languages) ### Hello, world Since the K&R book ["The C Programming Language"](http://cm.bell-labs.com/cm/cs/cbook/), many language tutorials begin with the ["hello, world"](http://en.wikipedia.org/wiki/Hello_world_program) program: a program that simply prints "hello, world" and exits. ```cpp // hello, world in C: int main() { printf("hello, world\n"); return 0; } ``` ```java // hello, world in Java: public class HelloWorld { public static void main(String [] args) { System.out.println("hello, world"); } } ``` ```php // hello, world in PHP: <?php echo("hello, world"); ?> ``` ```javascript // hello, world in JavaScript: console.log("hello, world"); ``` ```python # hello, world in Python: print("hello, world") ``` ```lua -- hello, world in lua: print("hello, world") ``` See [more examples at codepad.org](http://codepad.org/hello-world) ## Compiled language example: C The **C** programming language is one of the oldest still in wide use today, originally developed between 1969 and 1973 at AT&T Bell Labs. It became the 'de facto' language for system development in UNIX, and continues to be a primary language for software, operating systems, hardware devices, etc. today. It aims to provide a higher-level language that nevertheless does not obscure details of the underlying system. It is extremely well-defined and thus also serves as a consistent application binary interface (ABI) for operating between different programs and languages. For example, [LuaJIT can easily inter-operate with C via a foreign-function interface (FFI)](http://luajit.org/ext_ffi.html). Programs written in C stored in text files (with ".c" extension) and converted to binary applications (and libraries) using a **compiler** such as ```gcc``` or ```clang```; or simply invoked by the alias ```cc```. The following terminal command tells the compiler to compile the file ```hello.c``` and save the binary executable output (via the ```-o``` flag) to be called ```hello```: cc hello.c -o hello Now we can run this file like so: ./hello On Windows [it looks slightly different](http://msdn.microsoft.com/en-us/library/ms235639.aspx): cl hello.c hello ### Linking with a library What if the standard C run-time library doesn't offer capabilities we need? Then we can search for a library that already exists and link to that (so long as the license fits our needs). For example, C doesn't know how to read and write sound files, but the [libsndfile](https://github.com/erikd/libsndfile) library is designed to do just that. So the first thing we need to do is download & install it. ... Once installed, we need to tell the compiler where to find it. There are two parts to a library: the **binaries** which contain the actual machine code, and the **headers**, which are text files definining functions and types that tell you (and the compiler) how you can use the binaries. For example, library binaries might be installed into */usr/lib* or */usr/local/lib* on OSX/Linux, and headers into */usr/include* or */usr/local/include*. We can tell the compiler where to look using the **-I** and **-L** flags (**/I** and **/link /LIBPATH** on Windows): cc -I/usr/local/include hello.c -L/usr/local/lib -o hello But that only tells the compiler where to look. To actually use the headers we need to **#include** them in the source: #include <sndfile.h> And we also must tell the compiler to link the binary in our command, using the **-l** flag (or just the library name on Windows): cc -I/usr/local/include hello.c -L/usr/local/lib -lsndfile -o hello Sometimes the header file of a library is enough to understand how to use it, but usually we also want to refer to human-friendly [documentation and tutorial material](http://www.mega-nerd.com/libsndfile/api.html). This is only a *very very very basic* introduction to compiling C from the command line; reality can be *far far far more complicated*. Too complicated, really. ## Dynamic language example: Lua See [lua tutorial](lua.html) ## Data structures A data structure is a way to meaningfully organize data. Principally the task is to design a layout schema for memory that provides optimal performance for the desired task(s) the data structure will be used for. ### Contiguous memory E.g. 1D Arrays and regular structs in C. - If all data is local (no references or pointers to other locations), then the memory describes plain-old-data (**POD**), which can be very easily copied, stored, retrieved etc. Sometimes there may be some 'unused' space in arrays or structs, e.g. to align struct fields or array rows on certain byte-length boundaries. - Multi-dimensional dense arrays. We can represent a 10x10-element **2D array** using a 100-element 1D array, simply by addressing the memory accordingly. If we consider the 2D array in terms of rows and columns, we must choose between row-major or column-major ordering, i.e. whether horizontal or vertical neighbors are contiguous. This determines how iteration loops are nested (i.e. is the inner loop over X or Y?). The same principle can be applied to 3D or higher-dimensional arrays. - Nested structures. A struct describes the semantics of a block of memory: what **type** of data is to be found at a particular **offset** within the memory block. It is composable: we can create arrays of structs, and structs that embed arrays and other structs. So long as all nested structures are POD, the whole is POD. - Access. Accessing a desired point in an array or struct is extremely fast and cheap, especially if the region accessed is nearby other regions that were recently accessed. This applies equally to nested structures. ### Non-contiguous memory. Any data structure that uses pointers or references. One problem with POD data is that it cannot grow or shrink in size: the size is determined when it is allocated, because memory cannot move. For example, a list of active notes may need to dynamically vary according to performer behavior, or a list of active agents in a game varies according to player behavior. > One POD-friendly solution is to pre-allocate enough space for a maximum number of items, and do not allow this to be exceeded. Another is to allocate a new, larger array when needed, and copy data from the old to the new, updating any references that were using it. Reallocation can be expensive and should not be used frequently. The non-contiguous solution is to create a linked data structure, such as a linked list. A **linked list** contains a variable number of nodes, each of which contains or refers to the desired objects of the list, as well as *link* references to neighbor nodes. For example, a singly-linked list: ![linked list](http://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Singly-linked-list.svg/408px-Singly-linked-list.svg.png) (images courtesy of [wikipedia](http://en.wikipedia.org/wiki/Linked_list)) Each node could be in a totally different area of memory, but we can easily "walk the list" by following the links. The advantage of this approach is that we can easily insert or remove an item from the middle of the list, simply by changing the link references: ![linked list removal](http://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/CPT-LinkedLists-deletingnode.svg/380px-CPT-LinkedLists-deletingnode.svg.png) A **doubly linked list** nodes have references in both directions, allowing traversal in either direction as well as simplifying removal: ![doubly linked list](http://upload.wikimedia.org/wikipedia/commons/thumb/5/5e/Doubly-linked-list.svg/610px-Doubly-linked-list.svg.png) A **circular linked list** maps the tail to the head, allowing infinite traversal: ![circular linked list](http://upload.wikimedia.org/wikipedia/commons/thumb/d/df/Circularly-linked-list.svg/350px-Circularly-linked-list.svg.png) By adding another link we can define structures with hierarchy, such as trees. ![tree](http://cslibrary.stanford.edu/109/tree.gif) Using both arrays and linked lists we can create a number of useful data structures. Which one to use depends on the frequent and possible use cases in context. All the possible use cases must be supported, and the frequent use cases should perform optimally. Generally, if we require fast "random" access, an array is ideal, however if we require fast insertion and removal, a linked list is far better. ### Stack With a [stack](http://en.wikipedia.org/wiki/Stack_(data_structure)) we only ever manipulate one end of the list, adding or removing items. This is also called **LIFO** (last-in, first-out). If implemented using an array, all we need is an integer to indicate where the current top of the stack is. If using a linked list, all we need is a singly-linked list and a pointer to the current top. Stacks are very important to computing, and widely used in low-level language implementations. They are also used at higher levels, such as the Undo/Redo stack in any application. ![Stack](http://upload.wikimedia.org/wikipedia/commons/thumb/2/29/Data_stack.svg/391px-Data_stack.svg.png) ### Queue A [queue](http://en.wikipedia.org/wiki/Queue_(data_structure)) is also known as a FIFO (first-in, first-out), and implements pipe and stream behavior. We always add data at one end, and remove data at the other end. It can be implemented on a fixed-size array using two 'head' and 'tail' indices, though this limits the maximum number of elements in the queue. It can be implemented on a linked list by keeping a reference to the head and tail nodes. ![Queue](http://upload.wikimedia.org/wikipedia/commons/thumb/5/52/Data_Queue.svg/300px-Data_Queue.svg.png) A **deque** (pronounced "deck", like a deck of cards) is a "double-ended queue", in which items can be added and removed at both ends. It can be implemented on a doubly-linked list. The elements of a **priority queue** also have an associated "priority", with which they are ordered (sorted). High priority events are served before low ones. For example, the start time of an event can be used to sort a list of events, such that earlier events are served first. Generally the two main tasks are to a) insert element with priority, and b) retrieve/remove the highest priority element. Often the list is preserved in a sorted state by being careful to only insert at the right location (a random-access insert). This is called "insertion sort". By doing so, retrieval/removal always occurs at one end. ### Dictionary (aka Associative array or Map) A [dictionary](http://en.wikipedia.org/wiki/Associative_array) stores pairs, mapping from keys to values. Keys are unique (no key appears twice). The dictionary can be indexed at random by a key to return the value, and new pairs can be added, old pairs can be removed or old keys assigned new values. The Lua table is essentially a dictonary. Implementation challenges center on how to quickly resolve a given key, often via a *hash*. ### Other structures Sets, multisets, multimaps, trees, graphs, strings, ... ## Live Programming, Live Coding [Zen and the art of Live Programming](http://www.infoq.com/presentations/Live-Programming) [TOPLAP](http://toplap.org/) [Strange Places (Andrew Sorensen)](http://www.youtube.com/watch?v=TxQJPNSzl9s) ## Further reading [Notes on Programming (Alexander Stepanov)](http://www.stepanovpapers.com/notes.pdf) 1. code should be partitioned into functions; 2. every function should be most 20 lines of code; 3. functions should not depend on the global state but only on the arguments; 4. every function is either general or application specific, where general function is useful to other applications; 5. every function that could be made general – should be made general; 6. the interface to every function should be documented; 7. the global state should be documented by describing both semantics of individual variables and the global invariants. ----- ## Setting up a C development environment ** THIS SECTION IS INCOMPLETE ** Your operating system might not come with a compiler built-in, and it probably doesn't have all the libraries you need. Here's some quick notes on getting set up: ### Linux Getting a compiler: sudo apt-get install gcc For any other libraries, you can often get them with sudo apt-get install <libraryname> In this case, *sudo* means execute with administrative privileges, *apt-get* is a program for installing software and libraries (a "package manager"), *install* is the action for apt-get, and *gcc* is the application we will install (it will also install all dependencies). ### OSX Getting a compiler: For recent versions of OSX you must install command line tools manually. [You can do this from inside Xcode, or as a direct download from Apple](http://stackoverflow.com/questions/9329243/xcode-4-4-command-line-tools). Either way you will need to register with the Apple developer center. For any other libraries: There is no built-in "package manager" for OSX, but there are several that people have written. I currently recommend using one called "brew", which you can install by [following the instructions here](http://brew.sh). Once installed, run *brew doctor* to ensure your system is properly configured; do whatever it says until it is happy. Then, to install librares: brew install <libraryname> ### Windows Install a recent Visual Studio (e.g. Visual Studio 2012); this includes a compiler and a command line, found in the Start menu under "Visual Studio command prompt". Package managers for Windows are less usual; many libraries provide binaries or installers directly instead. [![stochastics](http://24.media.tumblr.com/tumblr_m26e2n7kQt1r8o4qmo1_500.gif)](http://0800-putaria.tumblr.com/)

GCT 633