Monday, March 31, 2014

Week 11: Sorting and Efficiency

About Sorting and Efficiency
I have talked about some points for efficiency in last several articles, so I will select some important points this time. Their common feature is to reduce the unnecessary duplication. For example, in week 4:  recursion in nested list and turtle, I have mentioned a method from our reading to improve efficiency in recursion, which is to use a variable(usually dictionary) to record the result from every iteration in order to avoid the repetitive work.
Moreover, sorting algorithm is a great case in the discussion on efficiency.  For instance, the principal improvement of Selection Sort compared with Bubble Sort is to make only one exchange for every pass. In the worst case, both of them will make (n-x) comparisons in the x-th pass for n items in the lists. However, Bubble Sort will also make one exchange in every pass, whereas Selection Sort will just remember the maximum or minimum and make the exchange once, which can almost save half the time than Bubble Sort.  The drawback of Selection Sort is that it will still check every items in spite of average case or the best case while Bubble Sort not.(i.e. the time complexity will not change in either the worst case or the best case.) Then the creation of Insertion Sort solves this problem by combining the advantages of both Selection Sort and Bubble Sort. Insertion Sort will not make effort to find the maximum or minimum but the "proper" position for the item in every pass like what Bubble Sort does. In the meantime, it will not make exchange every time but record the position and make only one exchange after finding the proper position like what Selection does.
Nevertheless, the efficiency of Insertion Sort still can't satisfy the demand of our computer scientists. (lol!!!)  Grouping is a good method to increase efficiency in our real life. Applying grouping in computer science, we have more amazing algorithm such as Shell Sort, Quick Sort , Merge Sort and Timsort. Shell Sort divide the lists to sublists by a given "gap". After that, the most part of the list will have been sorted and the task of the last basic Insertion Sort would be simple. However, the time complexity in Shell Sort depends on the "gap" we choose.
Quik Sort is a similar version of Binary Search Tree. Choose a pivot value and then divide the list to the part of the values less than the pivot and greater than the pivot. The Merge Sort, which is my favourite algorithm, just needs nlogn time complexity in any case because we know that it needs logn times to divide a list in half (i.e n /2 /2 /2...=2 -> 2 * 2 * 2 ... = n, then the number of 2 in the equation is logn). In the merge process, every items just likely compare with each other once, so the operation in total is still n. Then we get nlogn time complexity for the Merge Sort. The defect of Merge Sort is in the best case, the time complexity is still nlogn which is larger than n for Bubble Sort and Insertion Sort as we just need to make (n - 1) comparisons in the best case for both of them. Timsort manages to remedy this by combining Merge Sort with Insertion Sort as I have discussed in the last blog post.
In addition, it seems that the improvement of the algorithm on time complexity sometimes can lead to the increase of space complexity such as Merge Sort, Quik Sort and Timsort. I find another algorithm - Heapsort which can balance them.

About Asymptotic Analysis
As what I have talked in the last post, this topic is a little bit overlapped with the content of CSC165. The difference is that while CSC165 focuses on how to find an formula to express the runtime complexity of a function and its corresponding asymptotic analysis (i.e. Big-oh, Big-Omega, Big-Theta), CSC148 pays more attention to the explanation  for the time complexity of some given sorting algorithm. Nevertheless, they are close related to each other. In the last paragraph, I have analyzed how to get the time complexity of Merge Sort --- nlogn from the degree of CSC148, but we still can't obtain O(nlogn) without a detailed proof. I will make a brief proof for sorting algorithm so as to be more logic. Let's take an example of Bubble Sort:

1
2
3
4
5
6
7
8
9
10
11
12
def bubbleSort(alist):
    for passnum in range(len(alist)-1,0,-1):
        for i in range(passnum):
            if alist[i]>alist[i+1]:
                temp = alist[i]
                alist[i] = alist[i+1]
                alist[i+1] = temp
 
alist = [54,26,93,17,77,31,44,55,20]
bubbleSort(alist)
print(alist)

Assume the runtime of this function is T(n) for n is the number of items in the list. In the function bubbleSort, we can see two "for" loops. And the inner loop also has a "if" judgement sentence. As we are talking about Big-oh, we assume this is in the worst case so that every sentence in the "if" sentence will be always operated in every loop. For every inner loop, we get 4 times operations. Then we have 4 * passnum in every outer loop. Also, in the outer loop,  passnum will respectively equal (n - 1, n - 2,..., 2, 1). Then we get T(n) equal to 4(n - 1 + n - 2 + ... + 2 + 1) = 4 * (n - 1)n / 2 = 2n^2 - 2n. Then we can simply find (2n^2 - 2n) <= 2n^2 for n >=0. Then we get T(n) in O(n^2) in the worst case.

Challenge
To some extent, Sorting Algorithm is a complex study for computer science. There are many various algorithms. Each of them has itself's features, advantages and shortages. It is difficult to memorize so many functions or principles of these algorithms, especially for those algorithms involving recursion, many loops and built-in helper functions. Furthermore, the computation and the proof of the Asymptotic Analysis is really a challenge because it needs many mathematic knowledge. In addition, I saw it says that "According to information theory, no comparison sort can perform better than \Theta(n \log n) comparisons in the worst case." I am still wondering the reason...

Achievement
I think the topic about algorithm will be accompanying with me in my university study as I will choose many mathematics courses. In fact, I like logic analysis for many things so it is really interesting to dig some deeper theorems or applications about it. To my disappointment, CSC148 seems just to discuss a little about them as they were not included in any assignment and exercises. However, the last question of test 2 is a nice example which we could spend more time analyzing (one hour is not enough for a detailed one...) Anyway, they cannot stop my forward steps as they couldn't in the past, either! Come on! Final!


 wo
1
2
3
4
5
6
7
8
9
10
11
12

Sunday, March 30, 2014

Linked List & Binary Search Tree & Sort

Since I missed several blogs, I will talk about all of these 3 topics in this article.
1st: Linked List
Linked list is very similar with built-in list type in Python. Both of them are orderly and we can track the items by some "index". We can also easily implement some method in Linked list just like in built-list, such as "append", "remove", "+", etc. However, linked list is more like a listed list which only has two items, one of which is an object and the other is a listed list. In built-in list, the relationship between every item is parallel even if they are ordered, whereas in Linked list, the preceding item has a higher level than its following items. Therefore, I think that the most significant characteristic of linked list is its efficiency. In the week 2 lab, we found that the runtime for a similar tasks such as "enqueue" and "dequeue"  in a queue which uses list is much longer than that in a stack. The reason we analyzed is that in built-in list we need to move every item forward or backward by 1 step if we dequeue or enqueue 1 item in it since they are parallel, while in linked list, we just assign "self.next" to "self" in "dequeue" or assign "self" to "self.next" and the new item to  "self.cargo" in "enqueue". We consider the every item as a whole --- a node except the item that we delete or add in. For example, in our reading text, it has a paragraph talking about "The fundamental ambiguity theorem". It points out that there is often an ambiguity between a node and a list. The node of a linked list can be considered as a single item, whereas the concept of list is a collection of many items, as the text says "These roles are not part of the program; they are in the mind of the programmer." 

Challenge
Well, I don't think it is very hard to understand and implement a linked list since it is actually a simplified version of a tree except some abstract concepts such as the difference between a node and a list. It is important to avoid a loop in the linked list.


2nd: Binary Search Tree
Binary Search Tree is a strictly logic data structure. In mathematics, the relationship between any two real numbers is just equal to, greater than or less than. Binary Search Tree takes advantage of this fundamental theorem to increase the data search efficiency. Furthermore, in the logic analysis, as the every logic judgement has only two results --- "True" or "False", the he method of Binary Search Tree is very similar with a part of the content of our week 6 reading --- "The animal tree" Every node is a judgement just like " Is the input x greater or less than the given number y?" After we find the correct position for x, the program will learn a new judgement "Is the input z greater or less than z" just like what it does in the function "animal" in our reading.

Problem
I think there is a problem for Binary Search tree which is when we build a binary search tree from some lists with the same value but differently ordered items such as [1, 2, 3, 4] and [3, 1, 2, 4]. However, we will obtain the exactly same sorted list by Binary Search Tree with different data structures. Therefore, I prefer call this kind of tree as "a method" or "an algorithm" to construct data structure instead of a strict data sequence such as list and tuple or even linked list. Although the position of minimum and maximum values are relatively fixed (i.e. the leftmost and the rightmost), the positions of the other values such as the median varies from the tree to tree. 

Challenge
It is hard to combine the concepts of built-in lists, linked list and binary search tree, especially for the conversion among them(For instance, the Test 2...). In my view, the recursion is the only effective way to solve this problem. Otherwise, sometimes I have to use a loop (Of course!) and a built-in list as an immediate to cope with the conversion between linked list and binary search tree.

3rd: Sort
Sort is a large topic including many sorting algorithms. We have talked about Bubble sort, Insertion Sort and Selection Sort in CSC108. The new sorting algorithms (i.e. Quick Sort, Merge Sort, Shell Sort and Timsort) are really amazing ways. The differences among them are also obvious such as the how to split a list into sublist, how to compare the values and how to put them. For example, in Insertion Sort, we only have two parts --- the sorted part and the unsorted part. In every iteration, the item after the sorted part will be compared with every item in front of it until we find the correct position and insert it there. By contrast, Merge Sort has many sublists with only two items. It will compare each items in every sublist to put them in the correct position then merge every two sublists to a larger one successively and repeat the work. Every sorting algorithm has its characteristics and advantages. The invention of new algorithm is a great study. For instance, Timsort combines the strengths of Merge and Insertion Sort. The Minrun is just like the sorted part of Insertion Sort in Merge Sort.

About Asymptotic Analysis:
As I am learning CSC165, this conception is overlapped. The content in CSC165 about this is more difficult in fact because we need to find a function T(n) to represent the runtime based on the number of items n of the input and then implement a detailed proof to find its O(x) or Big-Omega or Big-Theta. Nevertheless, I don't think it is very complicated at least so far, perhaps for we are just the freshmen. Sometimes we can just draw a conclusion intuitively if it doesn't ask for a proof. Any way, it is very helpful to help us analyze the efficiency of a function or a sorting algorithm. 

Challenge
The memory of various sorting algorithm is a little bit hard, especially their respective python implementations. The  strict proof of asymptotic analysis is also challenging. However, I think these two topics are very important for my future study since I am going to study more about statistics and mathematics. I will spare no efforts to defeat them!


Sunday, March 2, 2014

Week 7: Recursion

About recursion
We have been learning recursion for several weeks. Actually, I think it does show the strong power of the computer. In recursion, we need to trace the process from top to bottom(like popping the item from a stack) and then return the result back from bottom to top(like re-pushing the item to a stack). For us, it is an extremely tedious and error-prone process but for computers, it can be done in a short time.

The computer science version of "series"
I think it efficiently takes advantage of computers' characteristics. A computer can repeat and memorize the function call and its corresponding result or another function call through many unique addresses whatever the number of iteration is. For example, it is very hard to trace a complex series for us (we may make some mistakes in calculation or forget the results) but it would be just several-time iterations for only one function call(i.e. the formula) for the computer.

Loop & Recursion
Recursion is very similar with a loop. Both of them need to repeat one or several functions. However, I think there are several obvious differences between them. First, a loop needs a condition to judge whether it has finished(i.e. a boolean sentence in 'while' or a range in 'for'), whereas a recursion needs a bases case and could be done when the result can be obtained by computer itself. Second, for the recursion we need to conceive how to make sure the result from an iteration can be applied in the next iteration, whereas for the loop, the result of every iteration can be independent. Therefore, when we know a base case and how to apply the result in new iterations but don't know the condition to judge the process, we'd better use a recursion.

Challenge
In my view, recursion is a relatively abstract concept in programming. Sometimes it is like a "magic". For example, in the assignment 1, when we consider how to move n cheeses in 3 stools, we wrote a function and told the computer step1, step2 and step3(As in the lecture). For us, it is really difficult because the indexes which the"source", the "intermediate" and the "destination" stool refer to will be exchanged in every iteration and in the meantime we need to calculate and memorize the result of every combination. We tell the approach and the starting point then the computer shows us the result but we don't know what happened in the process.

Achievement
Although it is not easy, I think I have learned the crucial conception of recursion. Assignment 1, readings and labs and test( I just got the score of test1! lol!)  help me comprehend the recursion totally.  For instance, in the lab5, I learned some useful built-in functions in python such as filter, all and any. Then I can save much time and space in programming! Now, we have begun to learn tree and linked list. I hope I can successfully use the power of recursion to win in the following challenges!


Sunday, February 23, 2014

Week 6: Tree

After the reading week, let me summarize what I learned from Week 6. The central topic of this week is Tree, which combines the concepts of recursion and object-orienting programming. Through Tree, we can create, store and analyze the data by categories. It is a little hard to understand "preorder":
Node is the first, then the left,  the right branch is the last
def print_tree(tree):
    if tree is None: return
    print(tree.cargo, end=" ")
    print_tree(tree.left)
    print_tree(tree.right)
"inorder"
The left branch is the first, then the node,  the right branch is the last
def print_tree_inorder(tree):
    if tree is None: return
    print_tree_inorder(tree.left)
    print(tree.cargo, end=" ")
    print_tree_inorder(tree.right)
and "postorder"
The left branch is the first, then the right,  the node is the last
def print_tree_postorder(tree):
    if tree is None: return
    print_tree_postorder(tree.left)
    print_tree_postorder(tree.right)
    print(tree.cargo, end=" ")
in its traversal as I am used to understanding the tree by preorder which I think a kind of my subconsciousness.
It is very useful in computer as I find that it can be linked with logic analysis - especially for CSC165 !. Every node has two branches as if every assumption has two opposite cases - true or false. We can conclude something from the different situations by using trees.
I have finished the part I of the assignment II. The goal of next week is to win the test!

Monday, February 10, 2014

Week 5: Assignment 1 & names in different scopes

Well, I think this week is relatively relaxing as I have finished the Assignment 1 one week ago. The instructor taught us a kind of iterative method to figure out the Hanoi Tower. I don't use it in my assignment as I have solved it by a non-iterative approach. However, I have to acknowledge that the iteration would be easier in this problem, at least in terms of comprehension. The concept of names in different scopes is easy to understand and very useful. I have used a similar method in my assignment  which is to create a non-local variable in a function to record how to move n cheeses by four stools in order to increase efficiency. Yes, it derives from the last week's reading! I believe these new material will help me solve more tough problems in the future.

Sunday, February 2, 2014

Week 4: recursion in nested list and turtle

This week, we talked about more recursion examples. In this examples, I have learned how to trace the recursion and understand the code of recursion functions. Especially, I learned how to make the function more efficient by using dictionaries in the recursion.
Recursion is an easy way to handle some problems that look complicated but sometimes it will wast some time to operate. In our reading, it will take a lot of time to cope with the problem of Fibonacci numbers. The code is like:
def fib(n):
    if n <= 1:
        return n
    t = fib(n-1) + fib(n-2)
    return t
However, the reading material also tells us how to handle it by dictionaries, we can simply create a dictionary to record what we have already gotten:
alreadyknown = {0: 0, 1: 1}

def fib(n):
    if n not in alreadyknown:
        new_value = fib(n-1) + fib(n-2)
        alreadyknown[n] = new_value
    return alreadyknown[n]
It is amazing and really helpful in my programming the assignment 1. Now, I can effectively accelerate my program by this technique. I find that we need to avoid not only duplicate codes but also allowing the computer to do duplicate tasks in programming. I have noticed that how important and helpful the efficiency is. I know we will cover more about efficiency in the future so this is a good beginning for myself.
After I spent two nights and a few hours in the day, I have successfully achieved the entire tasks of the assignment 1. The most difficult point is processing Tour.py which lets the computer itself to finish the Hanoi problem. It's a long way to encode, debug, mortify the documentation and optimize it. Nevertheless, I really enjoy this process and felt excited when the computer perfectly did what I asked!

Saturday, January 18, 2014

My 1st post for SLOG! - Object-Oriented Programming

Hello everyone,
My name is kevin. This is the blog I create for my CSC148 course! I really like Computer science for I can create everything I want the computer to do! It's amazing when everything operates normally even if it's also a little frustrating when I've been stuck for a long time due to a tinny mistake. Nevertheless, so far so good.

About Object-Oriented Programming
In fact, I think we had learned its basic concepts in the last part of CSC108 such as initializer, creating methods (including __equal__ and __str__), object and instance, etc. For example, in the last final test of CSC108, there is a question about creating a class to store the data of wines and wine cabinet such as the amount, brands and whether the owner likes it.  Every object has its own attributes such as colour, size and material, etc. And every object can finish certain functions. It is awesome that we can use Object-Oriented Programming to create these objects in our computer.

What's new
However, there are also many new concepts in this week. In the first two weeks, we have learned a new python documentation style-PEP 257 in which we can use another contact type and doctoring style. In addition, I think [function(x) for x in list] is an amazing form to express a new list from the original list. expression1 if condition else expression2 is also very useful. They are simpler and easier. Furthermore, the concept of Stack is new for me. Although I have seen Stack in CSC108 as a part of python visualizer, but as a class, it is really new for me. Now I find they are similar actually --- to operate the last item in a specialized list.

Challenge
The only challenge this week was the lab. First, it is embarrassing to say that I find I still cannot communicate in English very well, especially using some special terms to express some elusive ideas. Well, it's sometimes suffering. I need to spend more effort in it. Additionally, we were stuck in top(n) method. It took a long time for us to find a way to figure out how to sort the words by their frequency and by alphabetic order when ties appear. We forgot using tuple and suffering from converting dictionary to list and then reversely. Thanks to our TA, we finished it.

Conclusion: my comments for this week
This week, I think myself have covered the material at least not bad. I finished the exercise one soon after it was post, which is really easy in my opinion. The material is really relevant with that of CSC165 and CSC108. I have revised the knowledge of CSC108 and learned new documentation style and Stack, Object-Oriented Programming concepts. I am confident for the following study!