Geetanjali Agarwal successfully defended her MSCS thesis on image recognition

My student Geetanjali (Geet) Agarwal defended her masters thesis titled Aneka – Wavelet Image Hashing Algorithm, see announcement, where the contribution is a framework of hashing algorithms for image recognition. This important work is done in collaboration with the SoCal High Technology Task Force (HTTF). Geet deployed the AWS to accomplish her results, including EC2 instances and MySQL databases used to run experiments on thousands of images. Geet’s thesis will be available after the final draft is ready.

Iddo Tzameret and Stephen Cook strengthened our results from 2004 in the computational complexity of linear algebra

During my PhD years (1997-2001) in the Computer Science department, at the University of Toronto, my advisor Stephen Cook and I worked on laying the computational complexity foundations of Linear Algebra. To that end we deployed Berkowitz’s algorithm for computing the characteristic polynomial, as it allowed us to state major theorems of linear algebra in the theory NC2 (fast parallel computations). We published the final version of our result in the Annals of Pure and Applied Logic. Recently, Iddo Tzameret and Stephen Cook strengthened those results considerably in this paper.

Using AWS on a project in collaboration with SoCal HTTF to decrypt a password

Anyone working in the field of Digital Forensics is aware that a substantial portion of time is dedicated to reverse engineering passwords. That is, in most cases a digital forensics investigator receives a password-protected handheld device, or a laptop with an encrypted hard disk, or a Microsoft Word document which has been password protected.

It is then the task of the investigator to try to retrieve the evidence, and that in turns requires reverse engineering the password; in some cases this can be achieved by recovering the hash of the password, which is stored somewhere (the locations are often known) on the device’s memory.

In order to obtain the password from the hash, we have to run a brute-force search algorithm that guesses passwords (the guesses can be more or less educated, depending on what is known about the case). Sometimes we get lucky. There are two programs that are used extensively for this purpose: John the Ripper and hashcat.

As we have been studying methods for recovering passwords from hashes, we have been using AWS EC2 instances in order to run experiments and help HTTF with their efforts. Together with senior capstone students as well as graduate students in Cybersecurity, we have been creating a set of guidelines and best practices to help in the recovery of passwords from hashes. AWS EC2 instances are ideal as they can be crafted to the needs and resources of a particular case. For example we are currently running a t2.2xlarge instance on a case where we have to recover the password of a Microsoft Word document; we have also used a p2.16xlarge with GPU-based parallel compute capabilities, but it costs $14/hour of usage, and so we deploy it in a very surgical manner.

Paper with Ryan McIntyre to appear in the Journal of Discrete Algorithms

Happy to announce that Ryan McIntyre’s masters thesis results, An improved upper bound and algorithm for clique covers (prelim), will be published as our joint paper in Journal of Discrete Algorithms.

Our paper is on indeterminate strings, which are important for their applicability in bioinformatics. (They have been considered, for example, in Christodoulakis 2015  and Helling 2017.)

An interesting feature of indeterminate strings is the natural correspondence with undirected graphs. One aspect of this correspondence is the fact that the minimal alphabet size of indeterminates representing any given undirected graph corresponds to the size of the minimal clique cover of this graph. This paper first considers a related problem proposed in Helling 2017: characterize $late \Theta_n(m)$, which is the size of the largest possible minimal clique cover (i.e., an exact upper bound), and hence alphabet size of the corresponding indeterminate, of any graph on n vertices and m edges. We provide improvements to the known upper bound for \Theta_n(m). Helling 2017 also presents an algorithm which finds clique covers in polynomial time. We build on this result with an original heuristic for vertex sorting which significantly improves their algorithm’s results, particularly in dense graphs.

This work was the result of building on Helling 2017 (see this post) and of a year of research undertaken by Ryan McIntyre under my (Michael Soltys) supervision at the California State University Channel Islands.

iSprinkle – a topical capstone project @CSUCI @cagomez805

iSprinkle is a Raspberry Pi powered irrigation controller which will allow a user to set an initial irrigation schedule for a sprinkler system using a web interface, after which it will use the local weather forecast to adjust the base watering schedule as needed. iSprinkle is the result of a senior Capstone project (COMP499)  at California State University Channel Islands, undertaken by student Carlos Gomez in 2016, advised by Michael Soltys. A detailed write up of the project, where we partnered with Prof. Adam Sędziwy (who visited CI in June 2016), can be found here:

A short version of the above paper will be presented at INDOTEC2017:

iSprinkle was also presented at SCCUR 2016, the Southern California Council for Undergraduate Research Conference at UC Riverside on November 12, 2016.

The design of a system such as iSprinkle requires a holistic approach that is very different from most class assignments. The former usually span a few files that are to be turned in within a week or two, making it difficult to implement a system with many “moving parts.” However, iSprinkle’s functionality is divided between the front-end and back-end, both of which need to communicate so that the user’s requests are fulfilled. Designing such a system requires taking into consideration many aspects; from major decisions such as deciding on a backend language to use, to minutiae such as the date and time formats to use across the backend and front end to maintain consistency.

By doing so, iSprinkle will be able to irrigate more efficiently compared to a fixed schedule; by progrmmatically modifying the user’s watering schedule, iSprinkle will increase/decrease the amount of watering that the schedule dictates depending on data that it receives from a weather API. iSprinkle hopes to make it easier for homeowners to conserve water by automating adjustments to their irrigation schedule.

Carlos Adrian Gomez has since graduated from CI, and is working as a Software Engineer in the Ventura area.

Carlos’ on twitter

Running experiments on AWS

I can’t say how happy I am to have AWS. I just got the account set up, started my first instance, and run a simulation for a very interesting project that I am working on with Ryan McIntyre (a student in CS). What took about 15min on my Mac Pro quad core, took 1m40s on the AWS instance.

This is a brave new world! 🙂 . Here us the summary of the experiment:

~/EdgeGraph/EdgeGraph$ time python3 
How many vertices? 7
Generating graphs...
Filtering isomorphisms...
Sorting graphs...
Checking up to 21 edges...
0 / 21 edges complete.
1 / 21 edges complete.
2 / 21 edges complete.
3 / 21 edges complete.
4 / 21 edges complete.
5 / 21 edges complete.
6 / 21 edges complete.
7 / 21 edges complete.
8 / 21 edges complete.
9 / 21 edges complete.
10 / 21 edges complete.
11 / 21 edges complete.
12 / 21 edges complete.
13 / 21 edges complete.
14 / 21 edges complete.
15 / 21 edges complete.
16 / 21 edges complete.
17 / 21 edges complete.
18 / 21 edges complete.
19 / 21 edges complete.
20 / 21 edges complete.
21 / 21 edges complete.
elapsed time: --- 96.4 seconds ---

real 1m40.000s
user 1m36.812s
sys 0m0.113s

Rabin Miller algorithm: witnesses of compositness

The Rabin Miller algorithm for primality testing relies on a random selection of a’s for which it performs some basic tests in order to determine whether the input n is prime or composite. If n is prime, all a’s work, but if n is composite it is possible to show that at least half of the a’s work, i.e., at least half of the a’s are witnesses of compositness. But in fact, it seems like a lot more a’s than just half of those in the set {1,2,…,n-1} work; see the following diagram where I test all composites under about 60,000. It is clear that there are more good witnesses than just half the numbers.


Square-free strings over alphabet lists

In a new paper, Square-free strings over alphabet lists, my PhD student Neerja Mhaskar and I, solve an open problem that was posed in A new approach to non repetitive sequences, by Jaroslaw Grytczuk, Jakub Kozik, and Pitor Micek, in arXiv:1103.3809, December 2010.

The problem can be stated as follows: Given an alphabet list $L=L_1,\ldots,L_n$, where $|L_i|=3$ and $0 \leq i \leq n$, can we always find a square-free string, $W=W_1W_2 \ldots W_n$, where $W_i\in L_i$? We show that this is indeed the case. We do so using an “offending suffix” characterization of forced repetitions, and a counting, non-constructive, technique. We discuss future directions related to finding a constructive solution, namely a polytime algorithm for generating square-free words over such lists.

Our paper will be presented and published in the 26th International Workshop on Combinatorial Algorithms (IWOCA), Verona, Italy, October 2015.

My work on finite games cited by researchers at Vienna school of Economics

I wrote a paper about finite games which I presented at Computability in Europe in Athens, 2009. Now it turns out, that the Vienna school of Economics, Wirtschaftsuniversität Wien, has been citing it repeatedly in the last few months, in particular Aurélien Fichet de Clairfontaine. It is very satisfying to see research being picked up by other areas!

A new paper: Non-repetitive strings, with Neerja Mhaskar

Title: Non-repetitive strings over alphabet lists

Authors: Neerja Mhaskar and Michael Soltys

Abstract: A word is non-repetitive if it does not contain a subword of the form vv. Given a list of alphabets L = L1, L2, . . . , Ln, we investigate the question of generating non-repetitive words w = w1w2 . . . wn, such that the symbol wi is a letter in the alphabet Li. This problem has been studied by several authors (e.g., [GKM10], [Sha09]), and it is a natural extension of the original problem posed and solved by A. Thue. While we do not solve the problem in its full generality, we show that such strings exist over many classes of lists. We also suggest techniques for tackling the problem, ranging from online algorithms, to combinatorics over 0-1 matrices, and to proof complexity. Finally, we show some properties of the extension of the problem to abelian squares.