Overview
This database application is designed to act as a package for other Java applications. Its functionality is derived from the core Database object, which may be imported and used to manage a database. All methods are accessed through this database object, either directly, or through the two component objects, query
and schema
. The intention behind this is to create a simple and clear API for other developers to use when they wish to leverage this database system. A simple program, Demo
, has been created to demonstrate how this package may be utilised for use in an application.
This project has been built against Java 8u40, and uses Apache Ant (v1.9.4) for building and running tests. Please view the README for instructions on how to build and test the application.
Documentation
Documentation for the source code was generated using the javadoc tool. It contains information about methods and classes, including parameters, return values, and the version in which they were introduced. The generated docs may be viewed in the doc
directory, via index.html
.
Class by Class Breakdown
Record
This class handles storing a single record in the database, i.e. a row. It stores the values in the row in a list of Strings. This list is kept private to prevent possible corruption of the information by any careless external code. Furthermore, a list is used here in place of a simple array due to the fact that the data may be queried or altered fairly often. The ArrayList class offers a number of useful methods that simplify implementing these requirements.
The methods for this class consist largely of getters and setters, for retrieving and updating the data stored, in addition to options for adding and removing data fields. None of the methods, nor the class, are visible outside of the package, as it is intended that users will view the data at a higher level (tables) than through individual records. This is also better for data integrity, as data can then only be modified through clearly definied interfaces.
Table
This class provides an object for the common core component of a database, the table. It stores a group of records (rows) as a LinkedList. This is done for performance reasons, as it will likely be common for records to be added or removed. In a linked list this should only require a simple update of two pointers and possibly an allocation of a single element, whereas an array or ArrayList will likely require reallocation of the entire entry set. This class also stores the columns for the table as a list of Strings, with an ArrayList being used here for the same reasons as for Record values, as discussed above.
The table also contains an instance of the Printer object, which is covered in more detail below. The reason for keeping the code for printing tables in an independent object is to separate out user interaction (shell printout) from the logical code contained in the table object. The reason for connecting it to the table at all is to provide a simplified interface for the user. Tables may be printed simply by accessing the printer object they contain, rather than by having to instantiate a new object and pass the table to it. Furthermore, the printer object itself is only used to print tables, so functionally it only makes sense when tied to the Table object.
The collection of methods that come with Table mainly deal with retrieving and altering information about the columns and rows. Whenever data is retrieved it is presented in standard arrays, as opposed to lists, in an attempt to simplify the user's interaction with the class. It is intended that the user should be able to choose whether or not they want to engage higher-level data structures, like lists, in their code. Furthermore, as with the Record class, the Table class and its methods are package-private, as it is intended that the user will interact with the database through clearly-defined methods at the level of the Database class.
Database
This class is considered the core object for the database application. It contains the set of tables through a list of Table objects, and provides the user with access via the Query and Schema objects. The idea is that all functionality of the database may be used in an application through the Database object, and thus this is the only class that has a public constructor.
The tables contained in the database are stored as a Map, with the table name as the key and the Table object as the value. This provides rapid retrieval of the tables and allows a simple check on whether a given table exists. The database itself is tied to a particular directory on disk, as it needs somewhere to store the table files (handled by the DataFile object). The location of this directory is stored in the dataDir
variable, and when a Database object is instantiated, any table files stored here will be loaded into memory.
Access to table data, and the ability to update and modify it, is provided through the Query object, an instance of which is created when the Database is instantiated. The intention was to create a clean, simple API for a user, whereby database operatins can be performed through a simple db.query.$<$query\_type$>$()
method call. All operations on the tables in the database go through this query interface, apart from commit()
, createTable()
and dropTable()
, which are methods called directly from the database object due to the fact that they alter data in this object. They could have been included in the query class, but the preference was to attempt to maintain data encapsulation as much as possible. The database also contains an instance of the Schema object, which is discussed more in the Schema section below.
DataFile
This class is responsible for all manipulation of files on disk. Its purpose is to read and write table object to disk whenever it is require to do so by the main Database object. When a database is instantiated, this object is called upon to read and return all the tables stored in the specified data directory. For this reason it maintains the location of the data directory where these files are stored in dataDir
. Whenever the commit()
method is called from the database object, the saveTable()
and deleteTable()
methods in this object are used to update the tables stored on disk to keep them in line with those stored in memory.
To store table object in data files this class makes use of Java's object serialisation capabilities. It takes a table, including any objects it contains, and converts the state into a byte stream that may be saved to disk. For this reason, some of the classes in this package, including Table and Record, implement the Serializable interface, which allows them to be manipulated in this way. The decision to create the data files in this way, instead using delimited text files, was made to simplify the DataFile class, and to hopefully reduce the risk of bugs due to incorrectly formatted data. Finally, the DataFile object also contains a simply method used to retrieve a list of the tables stored on disk.
Query
This class is responsible for almost all of the operations performed on the data. The intention was to provide a single, clear API that could be leveraged by a user to interact with the database. Containing it largely in this single class reduces the complexity created by having to use methods spread out across a number of different objects. Furthermore, it should make it relatively straightforward for an application developer produce an SQL-like textual or graphical interface for the database, as most calls will simply be wrapper methods around those defined here.
The query object is instantiated along with the main database object. It retains a reference to this main object in db
, so that it may carry out its operations on the tables stored there. A number of its methods, such as add
, rename
and delete
simply combine such methods already present in the Schema and Table classes. The select method, however, is rather more complex, and is split across a number of other private methods. In general, the methods defined here are designed to mimic a subset of the functionality defined in SQL. Also, note that the insert method has been overloaded, to allow rows to be added either one at a time, or en masse.
ResultTable
This class has been designed primarily to work with the select method present in the query class. Originally, the select was intended to produce a 2D array containing the rows requested. However, this left the problem of including column names in the result. Therefore, it was decided that since a table object already existed to handle table-like data, it made sense to use it to present results of queries too, just as many SQL databases do. This also had the benefit of allowing the user access to a number of methods like renameColumn()
and addRow()
for free, which may be useful for them in formatting and presenting their data.
However, the default Table class contained certain methods that it would be best for the user not to have access to, and also came with the restriction of primary keys, which are not necessary on a result set. It was considered that perhaps some methods, such as those that dealy with primary keys, could be made package-private, and public versions without the key restrictions could be added in their place. This, though, meant bloating the Table class with methods unnecessary to its default use-case. Therefore, as the result table was a logical extension of the Table class in any case, the decision was made to subclass Table and overwrite methods as needed. That way, any methods that the subclass needed could become protected (still keeping them package-private), and any that it did not need could remain as they were.
Printer
This class was originally standalone, and required a table object to be passed to it for printout. However, it was decided that all the functionality it provided could be considered simply an extension to the Table class, so it made sense to include it as a component of a table object. Thus, when a table is instantiated, a printer object is instantiated with it, and retains a reference to the outer Table object in the variable table
. The reason for still maintaining Printer as a separate class is to avoid polluting the Table class with a number of user-facing printout methods, which are not relevant to its core functionality.
There are relatively few methods included with Printer. A couple of private methods exist for printing a single row and multiple rows respectively. The only two public methods are columns
, for printing out just the columns of a table, and all
for printing a table in its entirety. These methods may be accessed from a table object (likely a ResultTable object) via table.print.$<$print\_method$>$()
.
Schema
This class is designed to implement catalog-like functionality. Its purpose is to maintain metadata about the database, such as a list of tables and their columns. It stores this information in a similar way to the main database, in that is uses a Map. However, in this case, although the keys are still table names, the values are now simply String lists containing column names. It also keeps a record of the number of tables in the database. As with other classes, such as Printer, this class provides peripheral functionality to another class, in this case Database. Thus an instance of it is created whenever a database object is generated, and it is then accessed through the database object via db.schema.$<$method$>$
.
Most of the methods contained within this class are package-private, and are designed to keep the schema up to date as changes are made to the tables in the database. The data contained here is not designed to be altered directly by the user, as that would result in it not matching the state of the tables, so the only publicly accessible methods here are designed for retrieving data. The table()
method simply gets an array of columns for a specific table. The all()
method, however, returns a Map of the table names, similar to how they are stored in the schema, but with a standard array instead of a list. Although a simple array may have been preferable, a Map was used because it seemed the only sensible way to present the data. It would have been over the top to have to used a completely new object, as ResultTable was used for table results, for the simple case of retrieving basic schema information.
Also, it may be noted that some of the Schema methods are quite similar to those found in the Table class. Although schema data could have been contained directly in the main database class, and tied directly to the data contained with table objects, this would not have been in keeping with the principle of data encapsulation. As the schema was conceptually a separate entity, it made sense to maintain this data in its own class and not spread it across other classes, along with the cross-class manipulation of data that this would have entailed.
Demo
This simple program has been created to demonstrate how the db package may be used, via the built-in API, in an application. It may be run with the command ant demo
or java -classpath bin Demo
from the root directory (the one with build.xml in it). This program shows how the Database object may be imported from the package, and used as the entry-point for the database system.
A connection is made to the database, which loads up any table file present in the specified directory. Then a new table is created and some values inserted into it. The functionality of the commit method is shown next, as the created table is written back to disk. Next the query API is demonstrated, with a selection being made on the first and last columns. Some modifications are then made to the columns, and the result viewed using a schema method. Finally the table is dropped, and the changes committed to delete it from disk.
The use of Exceptions is shown here to provide flexibility to a developer using the API. In situations such as a database connection failure it may be necessary to terminate program execution, since subsequent interactions will require it to exist. However, in situations where a select does not work, it may be due to an error on the end user's part, so the exception can be used here to present a warning message but not crash the program.
Testing
Each class contains a private test method. These methods make use of the assertion statement to make sure that the class is behaving as it is supposed to. On the occurrence of a problem, the assertion will throw an Exception with a stack trace, which can be used to track down the bug. The test functions also throw generic Java Exceptions, so that if another issue occurs and throws an Exception, the information about this will be passed up to the tester in the form of another stack trace, to aid with debugging. This has proved quite useful for tracing unexpected bugs, sometimes in classes completely separate to the one currently being tested.
The main method for a class is used to run its testing method, and either inform the tester if everything passed, or print a stack trace if not. This keeps testing modular and tied closely to the classes that it acts upon. It also allows tests to be updated in a relatively straightforward manner. In order to run all of the tests together, a target has been added to the ant build file, so simply entering ant test
will run the tests on all classes.
Development Timeline
This section details the development process used to produce the database application. Instead of attempting to develop all the different components in parallel, a modular approach was taken. The different features to be implemented were split up, and the code was written according to a release schedule, with each version targeting specific features. This approach tied quite nicely into Java's preference for abstracting conceptually distinct functionality into individual classes. In fact, for the most part, individual releases tended to correspond to the addition of a new class to the codebase.
A snapshot of the code at each stage of the development has been included in the releases section, with each snapshot named according the releases described here. They are treated as minor releases, with the complete project being treated as a major release, named v1.0-final
.
Release v0.1
This is the first minor release, and mainly deals with creating the project structure. It also includes the first class, Record, which stores rows in the database as objects. The first set of docs were generated for this release.
- Created Record class, designed to store data in the database.
- Added docs using javadoc.
- Created first set of unit tests for Record.
- Added README.
Release v0.2
This release focuses on the creation of a Table class, representing the core unit of data encapsulation in a database: the table.
- Created Table class, which wraps up a set of Records. Also manages column information.
- Tweaked Record class, added methods for adding and removing fields, to work with the Table functions for adding and removing columns.
- Added unit tests for Table, and for Record changes.
- Updated docs.
Release v0.3
The focus of this release was the DataFile class, which is responsible for reading and writing of data to disk. It makes use of Java's object serialisation capabilities.
- Created DataFile class, with methods for reading and writing tables to disk.
- Updated Table and Record classes to implement serialisation.
- Added information on how to compile and run the code to README.
- Fixed: missing documentation for Table test method.
- Tweaked class access modifiers for javadoc.
Release v0.4
This release deals with the Printer class, which handles printing out data to the screen.
- Created Printer class, which offers methods for printing out table information to screen.
- Tweaked Table and Record classes to allow them to return information required by Printer.
- Updated docs.
Release v0.5
There are two main additions to the codebase with this release. The first is the addition of the Database class, which is intended to provide the core functionality for the program, and tie the other classes together. The other is the addition of primary keys to the tables.
- Added primary keys, they are defined as the first column of any table.
- Updated tests for Table and Record.
- Tweaked DataFile: changed dataDir from a class to an instance variable, added method to list table files.
- Created Database object, designed to be the entry point for the database.
- Made sure tests for Database and DataFile clear out the database directory after themselves.
- Fixed bug that resulted in incorrect file extensions.
Release v0.6
Instead of adding any one main new feature, like previous releases, this release focuses on refactoring, stability and performance improvements. It is partly in preparation for the introduction of queries in the next release.
- Database now builds database in memory from file upon cosntruction.
- Added drop table method to Database.
- Added ability to delete files to DataFile.
- Database now deletes files from disk corresponding to tables that have been dropped.
- Added commit method to Database, which updates the disk from the database in memory.
- Updated many of the classes to make better use of Exceptions, this has simplified a lot of the testing code.
- Fixed bug where DataFile was listing incorrect file names.
- Switched rows in Table to linked list for performance reasons.
- Altered access modifiers on a number of classes and methods to improve abstraction.
- Fixed bug where column deletion could result in loss of primary key.
Release v0.7
The primary target of this release is the new Query class, which acts as an API for CRUD operations on the database. It also includes shifting to a new Ant build system and updates to the Printer class.
- Created Query class, handles simple database queries.
- Tied Query to Database.
- Tweaked current classes to work with requirements of Query.
- Added a build file for Apache Ant, to simplify build process.
- Created ResultTable, a simple subclass of Table with primary key restrictions removed. Designed for use with the Query select method.
- Tweaked Printer: now acts as a component of Table class.
- Updated docs.
Release v0.8
This release introduces the Schema class, designed to maintain metadata about the database.
- Created Schema class, for managing database metadata.
- Modified Database and Query methods to work with Schema.
- Updated docs.