Mojo By Example: A Comprehensive Introduction to the Mojo Programming Language

5. Types

Mojo provides quite a number of data types out of the box for our use. Some of those types are described below.

5.1. Bool

The simplest of all types, a Bool represents values True or False. Bool value stores exactly one bit, either 1 representing the value True or 0 representing the value False. True and False are built-in constants in Mojo, and are treated as keywords.

    var bool_value: Bool = True

5.2. Int

Int is one of the most used data types in programming. It represents a mathematical integer, however, there are limitations on how big a value it can store. The Int type in Mojo is a built-in type and its size depends on the CPU architecture your program is running on. For a 64 bit architecture, the Int type in Mojo has size 64 bits; whereas for a 32 bit architecture, it would be 32 bit. How big a number can fit in integer types depends on whether that integer is a "signed" or "unsigned". A signed integer means that contains both negative and positive values. An unsigned integer does not allow any negative values. Int is a signed integer and therefore in a 64 bit CPU architecture, it allows values of range from -9,223,372,036,854,775,808 until 9,223,372,036,854,775,807, both inclusive.

    var int_value: Int = 42

5.3. UInt

Similar to Int, UInt type in Mojo is a built-in type and its size depends on the CPU architecture your program is running on. The main difference from Int is that UInt is unsigned, meaning it represents only positive integers, including 0. Since it represents only positive integers, the one bit that is usually reserved for sign is free to be used to represent values. This means that its maximum possible value is much more than the signed Int type. For example, a 64 bit unsigned integer would have the range 0 through 18,446,744,073,709,551,615.

    var uint_value: UInt = 84

5.4. IntLiteral

IntLiteral is the type when you provide an integer value directly in source code. It has infinite precision, but cannot currently be represented at runtime when the value is higher than the one supported by Int. Mojo allows underscore character "_" to as a separator for int literals to make it easy to read large numbers.

    var int_lit: IntLiteral = 10_000

In the code below, you can see that a very large value is being operated upon using a floor division (we will cover floor division later when we cover operators). This is one of the benefits of using IntLiterals as the compile time calculations can be done on a very large precision. When you execute the code, it will print 10000.

    print(9999999999999999999999999999999999999999999//999999999999999999999999999999999999999)

IntLiterals can be assigned to Int types. Vice versa is not possible, as the value for IntLiterals come from the Mojo source code. Value for the Int may come from other sources such as files, network or source code. This holds true for all other literal types in Mojo.

5.5. String

String is also one of the most used data types in programming. It is a sequence of Unicode characters representing a given text. Unicode is a text encoding standard maintained by the Unicode Consortium and consists of more than hundred thousand codes representing characters in almost all of the world’s writing systems. Since String abstracts over a sequence of Unicode characters, when you determine the length of a String, it will return the count of characters (grapheme clusters to be precise).

However, to store or transport such a String we need to represent that String as a sequence of bytes. A popular character representation format is UTF-8, which uses one or more bytes per character depending on the Unicode code point (an integer value designated to represent the character).

When receiving or sending strings over files or network, always ensure that you know what encoding is being used. Quite often subtle defects occur because the programmer expected a different encoding than the one they received.

Strings in Mojo are immutable. Any modification of the String actually returns a new String.

    var strg: String = "Hello World!"

5.6. StringLiteral

When you directly provide strings in source code within double quotes or single quotes the value gets assigned the type StringLiteral.

Mojo allows embedding of one type of quote within a string of the other type of quote. For example, you can embed '' within "", and vice versa. However, make sure to use the same type of quotes for beginning and end of the string.

    var strg_lit: StringLiteral = "Hello World!"
    var strg_lit2: StringLiteral = 'Hello World!'
    var strg_lit3: StringLiteral = 'Hello "World"!'
    var strg_lit4: StringLiteral = "Hello 'World'!"

You can define multi line strings using three double quotes like """ or three single quotes like '''. Multi line strings will preserve the new line characters and white spaces.

    var strg_lit_multi: StringLiteral = """
    Hello World!
    """
    var strg_lit_multi2: StringLiteral = '''
    Hello World!
    '''
    var strg_lit_multi3: StringLiteral = '''
    Hello """World"""!
    '''
    var strg_lit_multi4: StringLiteral = """
    Hello '''World'''!
    """

StringLiterals can be assigned to String; this is why when you declare a String variable, you are able to pass a string literal in source code to it.

5.7. FloatLiteral

FloatLiteral is the type that Mojo compiler assigns to a value when you provide a decimal separated numeric value in the source code. The FloatLiteral is "double precision", which is represented with 64 bits. The mantissa part of the value is represented by 52 bits and the exponent part of the value is represented by 11 bits. The last remaining bit is used for sign.

    var float_lit: FloatLiteral = 2.005

5.8. Float16

Float16 is a 16 bit floating point type, also know as "half precision". On some machines lower precision types can be much faster than higher precision types and so are quite useful if high precision is not important in your domain.

    var float_16: Float16 = 1.011

5.9. Float32

Float32 is a 32 bit floating point type, also known as "single precision". This type has 23 bit mantissa, 8 bit exponent and the last bit used for sign.

    var float_32: Float32 = 3.25

5.10. Float64

Float64 is a 64 bit floating point type, also known as "double precision". The 64 bits are distributed as 52 bits for mantissa, 11 bits for exponent and the last bit for sign. This is the same precision that FloatLiteral also has.

    var float_64: Float64 = 5.6

5.11. Int8

Int8 is a signed integer represented with 8 bits. It has the range of values from -128 to 127. Integers represented with low number of bits save space in memory and also can be used to enforce supported range of values. Similar to floats, Int8 reserves one bit to represent a positive or negative sign.

    var int_8: Int8 = -20

5.12. UInt8

Similar to Int8, UInt8 is represented by 8 bits, but it is an unsigned integer. Since it is unsigned the range of UInt8 is from 0 to 255.

    var uint_8: UInt8 = 20

5.13. Int16

Int16 is represented with 16 bits. It has a range of values from -32,768 to 32,767.

    var int_16: Int16 = -29

5.14. UInt16

UInt16 is also represented with 16 bits. It has a range of values from 0 to 65,535.

    var uint_16: UInt16 = 34

5.15. Int32

Int32 is represented with 32 bits. It has a range of values from -2,147,483,648 to 2,147,483,647.

    var int_32: Int32 = -78

5.16. UInt32

UInt32 is represented with 32 bits. It has a range of values from 0 to 4,294,967,295.

    var uint_32: UInt32 = 87

5.17. Int64

Int64 is represented with 64 bits. It has a range of values from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

    var int_64: Int64 = -65

5.18. UInt64

UInt64 is represented with 64 bits. It has a range of values from 0 to 18,446,744,073,709,551,615.

    var uint_64: UInt64 = 77

5.19. BFloat16

BFloat16 is represented with 16 bits. It is known as brain floating point. Its main use is in machine learning to increase the performance of ML algorithms.

//TODO: Uncomment before release: include::{sourcedir}/base_types.mojo[tag=bfloat16] #

5.20. SIMD

SIMD standards for Single Instruction, Multiple Data. Processors that support SIMD allow for parallel processing of multiple data points using exactly the same instruction. SIMD was initially implemented for super computers but over a period of time, came to be used in desktop computers as multi media consumption on the desktops increased. The main benefit of SIMD is to perform vector and matrix operations, as many times the same operations need to be applied to many elements of those data structures.

Mojo provides out of the box support for SIMD. Most of the base types mentioned above are built on top of Mojo’s SIMD type.

    var simd1: SIMD[DType.int8, 4] = SIMD[DType.int8, 4](10)
    var sc: Int8 = 3
    print(simd1 * sc)

In the above code, a SIMD vector of 4 elements containing data of type Int8 is instantiated with value 10 assigned to all the elements. Then when we multiply it with a value 3, each of the element is multiplied with that scalar, resulting in [30, 30, 30, 30]. On a supported hardware, just one single instruction will be applied over 4 different elements at the same time to yield the array of resulting values.

5.21. DType

In the previous example you saw the initiation of a SIMD instance by passing a data type DType.int8. DType in Mojo provides a list of data types that are supported within Mojo. One of the uses of DType data types is to use data types as arguments to functions. DType also provides some operations that help in introspecting at runtime different attributes about the data type. DType is particularly useful in providing compile time optimization by creating specialized code for a particular type.

    fn introspect(type: DType):
        print("Bit width:", type.bitwidth())
        print("Is signed:", type.is_signed())
    
    introspect(DType.float16)

In the above example, we can write a generic function that takes any DType and prints its bit width and whether or not it is a signed type.

5.22. Type safety

Let’s try something. Execute the following code in Mojo.

def main():
    var int_value: Int = "42"

Executing the code listed above results in:

error: cannot implicitly convert 'StringLiteral' value to 'Int' in 'var' initializer
    var int_value: Int = "42"

The reason is simple. Mojo is strongly typed. When you specify that a variable has type Int, then it expects either Int values or values that can be converted to Int. In this particular case, we tried to pass a String literal as Integer, and Mojo compiler did not allow us to do that. If Mojo was not that strict we could end up with defects where we assume a variable of a particular type which in reality it is not. This is of particular concern in large code bases worked on by many people.

Now let’s look into the following.

def main():
    var string_val: String = String(42)
    print(string_val)

The code shown above compiles and runs successfully and prints 42. The reason is a bit less obvious. The String provides an initializer that takes integer values as input argument. When Mojo compiler encounters incompatible types, but finds such an initializer, it automatically initialize with the passed in value. We will cover initializers later on.

5.23. object

As you have seen earlier, Mojo is quite strict about types. How about the situation when you do not yet know or do not care about the type of the variable, but still want to perform some computation? Mojo provides object type for such cases.

    fn add(a: object, b: object) raises -> object:
        return a + b
    print(add(1, 2.5))

If you execute the above code, you would see the result 3.5 printed on screen. The reason why Mojo did not complain about the type incompatibility of arguments is that the object type has initializers for many built-in data types. Similar to the example mentioned above for String, Mojo calls the appropriate initializer in object corresponding to the type of the given value. If object does not have an initializer for a given type, then a value of that particular type cannot be assigned to variables of object type.

In the above case, object has initializers for both Int and FloatLiteral. Mojo then instantiates an object with Int and the other object with FloatLiteral as its underlying value.

In case of def functions, when you omit type annotations on variable, argument, return declarations, Mojo automatically assigns it the type object.

5.24. Tuple

Tuple in Mojo is an ordered sequence of values. A Tuple can have many elements of different types. Mojo uses () to represent Tuple literals in source code.

    var t: Tuple[Int, Bool, Float64] = (1, False, 3.5)

The code listed above defines a tuple with elements 1, False and 3.5. You may have noticed that the code above defined some parameters within square brackets. We will come to it in a later chapter.

You can also get length of the tuple by using Mojo’s built in function len as seen below.

    print(len(t))

An empty tuple can be defined using just ().

    var e: Tuple = ()
    print(len(e))

Earlier we saw a tuple being declared with Tuple[Int, Bool, Float64]. We can also declare a tuple as (Int, Bool, Float64). Both the declarations are effectively the same.

    var altr: (Int, Bool, Float64) = (1, False, 3.5)
    print(len(altr))

To get an element of a tuple, you can use subscript operator [] and pass within the square brackets the index. Note that like most other languages, Mojo has a zero based index. The ability to use subscripts also applies to lists.

    var access: (Int, Bool, Float64) = (1, False, 3.5)
    print("First value", access[0])

In def style functions, you can unpack the values of a tuple into different individual variables. The individual variables will have the right data types according to the values that are assigned. The first variable on the left-hand side gets the first value of the tuple on the right-hand side, the second variable on the left-hand side gets the second value of the tuple on the right-hand side, and so on.

    def multi_vars():
        a, b = (1, False)
        print("Variables a & b:", a, b)
    multi_vars()

5.25. ListLiteral

Similar to Tuple, Mojo also provides support for ListLiteral. A ListLiteral can have many elements of different types. Mojo uses [] to represent list literals in source code.

    var l: ListLiteral[Int, Float64] = [1, 3.5]
    print(len(l))

The code shown before defines a list with elements 1 and 3.5. Same as with Tuple some parameters are defined within square brackets, which we will cover later on.

An empty list can be defined using just [].

5.26. DictLiteral

To get an element of a dictionary, you can use subscript operator []. Within the [] you can pass the key with which the dictionary is indexed.

5.27. Type inference

In cases where Mojo can infer types for variables, we can omit the type declaration of variables. For example, if a variable is initialized at the time of its declaration, then the Mojo compiler is able to infer the type of the variable.

In the following example, even though we do not declare the types of bool_value2 and int_value2, Mojo is able to infer the types as Bool and Int respectively.

    var bool_value2 = True
    var int_value2 = 1