Resizing VirtualBox disk using Vagrant

Recently, the box that I was using to try out Ansible ran out of memory. Resizing the disk turned out to be a bit of a pain.

First problem, we were using VMDK format for the virtual hard disk. This format cannot be resized. You would have to convert it into an expandable VDI format. But since root file system was on this drive, we could not take the risk of running resizing the file system on the drive. Hence we created a new virtual hard disk in the VDI format and attached it to the VM.

Once that was done, we had to just expand the logical volume on which / was mounted. On a high level, we had to

  1. Run fdisk on the drive and create a primary volume with an LVM label
  2. pvcreate 
  3. Add the newly created physical volume to the Volume group
  4. lvextend the root partition
  5. resize2fs

Once this was done, we repackaged the box and normal services were returned.

 

Advertisements

Database design at scale

When developing large scale enterprise applications, the choice and the design of the database is crucial to the financial and operational success of the system.

There are three factors that come into play when choosing a database.

  1. The data model
  2. Operational capabilities of the database in question
  3. Tooling around the database

These three factors are not independent of each other. Complications arise from the interdependencies between them. For example, a relational data model is much more suited adhoc aggregation and querying than any other data model. Consequently, there a number of analytic tools around relational databases. This makes it difficult for a systems engineer to analyse and compare the different database solutions available and choose the one that would suite his/her need the most.

Single node

The simplest approach to database design is to have a single server with all the data. With this approach, you would avoid a whole class of problems that come along with multi node approaches.

At large scales, though, your data and access requirements could quickly outrun the capability of the machine. When this happens, you would have no choice but to throw a lot of resources like memory, network bandwidth etc at the machine which could be expensive.

The single node approach suffers from another problem that the node forms a single point of failure for the database system. Ideally we would like to have redundant nodes which can take over should the node fail.

Replica nodes

To avoid the problem of single point of failure when having one node with all the data, we create “hot standby” nodes. The nodes are plain replicas of the single node to which all read and write operations are directed.

Though conceptually the idea of replicas are simple, it is fraught with operational issues which should be solved. Some of the problems are listed below.

New Node Warming

Imagine that we have a system operational in production with one node (which we would call master) and we would like to add a replica. For the replica to be considered valid, it should have all the data present in the master. If there is a large amount of data in the master, this could be a huge logistical issue.

Replication of meta data

In any database system, we would to create, modify and remove meta data like schemas, indexes etc. We would want this meta data, which could be non trivial in large systems to be replicated automatically as well.

Master election and automatic fail over

In the case of the failure of master node, we would like one of the replica nodes to take over automatically so that the system is always available to the end users. If there are many replicas, we call this the problem of “electing a new master”.

Once a new master is elected, all reads and writes are should directed to this node. This could be achieved using a middle ware which becomes aware of the new election and starts directing traffic to the newly elected node.

Another class of problems arise when the old master  comes back up again. Decisions should be made whether the newly elected master continues to be the master or whether the old master takes over. In either case, we would have to bring the old master up to date with the latest data before it could join the cluster.

Split brain problem

Automatic fail over demands the existence of a middle ware which routes all traffic. Now which middle ware becomes a single point of failure for the system. To avoid this, we would like to have redundancy in the middle ware level as well. Redundant systems with state is a huge time bomb.

Situations might arise when one part of the middle ware thinks that system A is the master and another part thinks that system B is the master. When this happens, the middle ware could start routing traffic to both the nodes; in which case, the nodes would diverge and could require manual intervention to bring them back into sync again.

Transaction overhead

To ensure that either all nodes see a change, or no node sees it is a problem in itself. This would require all the nodes to take part in a transaction which could be wasteful effort, since the chances that more than two nodes would fail is rare anyway.

Low utilization

Low utilization is the most fundamental problem with replica nodes. We would have invested in (number of replicas) times the resources that a single node would consume and all that we are gaining is mere automatic fail over of a node which is not likely to fail.

Multi master

A multi master configuration just treats all the nodes in the cluster as a master capable of handling reads and writes. This solves the problem of low utilization but comes with problems of its own

Reconciliation

Remember the split brain problem? A multi master configuration is split brain problem on steroids. At this scale, it is no longer possibly to manually reconcile divergent nodes and there should be an automated way of doing that.

Eventual consistency over Transactionality

Implementing distributed transactions in replica nodes which have data flowing in only one direction (from master to replicas) is hard enough. Implementing the same in a multi master set up could lead to very poor utilization of resources due to pessimistic locks usually associated with transactions.

Specificity

The most fundamental problem with a multi master setup though, is the fact that it could not be applied to all domains and all data models. A relational data model with its referential integrity constraint for example is not naturally suited for a multi master setup. Also some domains like banking which have data that should be globally consistent might not be inherently suited for eventual consistency and reconciliation.

Shards

Shards are the most complicated designs in a database system. Shards avoid the problem of low utilization by partitioning data across multiple nodes. They are designed around the fact that “not all data are created equal”.

There are some kinds of data that should be more consistent and available and some data like audits that need not be available and could be eventually consistent. In a domain specific manner, certain entities need not have any relation to other entities. For example, personal accounts in the banking systems could be treated separately from corporate accounts.

All these factors give us an opportunity to store data in separate data sources there by managing the sheer size of the database. As with any approach, sharding is not without its share of pitfalls.

Complex client/middleware

Once you have sharded the data, reads and write are more complicated since the client needs to know which shard to write to or read from. This could be abstracted away from the client using a middle ware, but still this is a problem for the system designer.

Resharding

Imagine that you have a database of people who sell items. If one person has too many items, you would want to move the person to a different shard. When you do this, you would have to move not only the person’s record but also all the items that belong to the person. Ideally we would like to do this in an automated fashion.

Availability

Certain portions of the database might not be available for read or write when automatic resharding is in progress.

Cross shard data movement

When your usage patterns change and you would have to combine data from across the shards, the amount of data movement over the network could be inhibitively expensive to say the least. In these cases, you would have to juggle around with your sharding scheme to avoid this data movement.

Distributed map reduce

Given that sharding is something that we would have to live with in large systems, distributed map reduce helps us reduce the processing time and data movement by partitioning the queries, distributing the work load and transporting only the result over the network. Such databases are said to follow the ACID 2.0 (Associative, commutative, Idempotent and distributed)  paradigm, for these group operations help in distributing and combining the results. But that is a topic for another post.

References

1. https://github.com/youtube/vitess

The programming language wars

People usually argue that one language is better than the other. Most of this is opinion based, but lets see if we could add method to the madness.

There are three contexts in which this problem could be analyzed.

  1. The Engineering aspects of the programming language.
  2. The user’s context
  3. The ecosystem

The engineering aspects

Certain languages offer better constructs from an engineering perspective. Better could be in terms of  number of unknown impacts of using the construct. The lesser the better. As an example, shared memory along with concurrency is a very poor construct just from an engineering perspective.

The user’s context

Just better engineering capabilities are not enough to decide the language to use. We must consider the “average” programmer in your team and what are their expertise. They could be legends in constructing programs using a set of constructs, in which case, it would be unwise to ask them to program using something new. If you have a problem with this, you have no choice but to change the team. If the expertise is unclear, as is in most cases, the better the engineering capabilities,  the better the language.

The ecosystem

The third main dimension is the community around the language. This is the community that is going to write libraries and tools that will make application development easier in your chosen language. There are two kinds of communities (mature and slow) vs (new and fast). Older communities are generally slow to adopt or innovate new techniques. Newer communities are typically faster at doing that. 

The important thing is to evaluate your context against these dimensions. There is no one global language which suites all contexts.

Why use open source operating systems

The scope of the post is just to convince you to use open source operating systems for your personal laptop or desktop. You might be using operating systems at different settings like servers etc where various other forces come into play.

The most reasonable

  1. Runs on your hardware. Not the ones they “recommend”.
  2. Lesser lock down compared other operating systems
  3. Constantly evolving and exciting
  4. More chance that more people who know their stuff are looking at your code. And no, just because it is from Apple, I don’t agree that the developers over there necessarily “know” what they are staring at.
  5. Software in your language(human).
  6. A chance to be a part of a diverse community. You will find all kinds of software from toy programs to the same systems that power a huge enterprise.

The niche

  1. If you don’t trust the compiled binary, compile it from the source yourself. NSA anyone?
  2. More opportunities to configure it the way you want it. Example, you want to use some flags when starting applications using the launcher like chrome flags.

ChkConfig and Systemd

These are service managers in redhat based and debian based systems respectively. 

I have worked mostly with chkconfig. Interface wise, it is much better than the windows service manager. They work on top of simple abstractions like files and directories, and as such give you immense power to customize the startup of services.

chkconfig allows you to specify run levels at which you should start the service. I have used this to manage the dependencies between the services. I am not sure it gives any other mechanism to specify this.

Vim modes

Finally after playing with vim for a long time, I could figure out three different modes for vim

1. Edit / Replace

2. Navigation

3. Command