Let's now look at some structured data.
An example of a database that stores
songs, their lyrics, the albums that
contain those songs, and the artists who
may have sung them.
This example is taken from a Sigmont
paper, which talks about searching
relational databases, as opposed to using
queries.
Let's see what it would take in sequel, to
get the albums with, World in their title.
One might write a Sequel.
Such as this, which is an Oracle, sequel.
You select star from the album table,
where the title contains the string,
world.
Things get a little more difficult.
If one now wants more information from
this, complex schema.
So let's ask, how many sequels.
Will it take.
To retrieve the names of each artist.
And the lyrics of every song.
In an album, that has world in its title?
Take a look at the schema and answer the
question.
Please avoid complex joints which joins
every single table in the schema.
You could do that but we are trying to get
at how many simple queries will it take.
Here is how we could do this.
We first retrieve the algum.
From the album table then we have to
traverse this table to find out all the
songs in that album and their lyrics from
this table.
We also have to find out all the artists
that composed this album, B1.
From the artist album table and then
retrieve the actual artist names.
Each of these can be done with a single
query, to allow a one table traversal
joint.
Otherwise each of these will require two
separate queries.
Quite complicated for doing something
which is easy to do, if one just had a
Google like search on this database.
Unfortunately, that is quite difficult to
achieve.
Imagine, if we had a search interface, so
that we could issue a query, like, off
the, the world, since we didn't really
remember the exact title of the album.
The sequel approach would end up missing
partial matches such as the album title
World or the album title, Off The Wall.
Next the schemer needs to be understood
quite carefully in order to issue the
multiple sequence needed to retrieve the
information that we want.
Some times a complex join might be needed.
But there's even more.
Suppose there were multiple databases,
each with a different schema.
And partial, or duplicated data, across
these databases.
And suppose the keys used in the vate, in
each database were different, with no
relationship to each other.
Most importantly, suppose we have some
unstructured data in documents, like text
files which contain the lyrics or
biographies of artists.
And the others in structured databases
like the lyrics database.
How to search both of these together.
When you [inaudible], when you search a
set of documents and you get an album, can
you find the songs in that album by
looking at the lyrics database, and vice
versa?
The point I'm trying to make is that
searching structure data well, in a query
like manner remains a research problem.
The fact that so much structured data is
being accessed using applications, is only
because a lot of complicated programming
using C quill goes in to accessing that
data.
Let us conclude now by asking whether
looking is the same as searching.
When we look around a room for example and
recognize objects and people doing
activities.
We're not really searching for anything.
Or when we're browsing a book shelf or
flipping the pages of a book or even
looking at some data, like some time
series or a histogram or charts, to see if
there might be any hidden patterns in the
data.
In the first case while seeing, we are
visualizing a scene.
Computationally this involves techniques
such as clustering and classification.
In the second and third examples, we're
trying to get a feel for a document A
collection of documents or some data.
Which requires, techniques such as
automatically summarizing documents,
discovering the topics and documents.
And discovering interesting correlations
in data without direct intervention.
Each of these are deep research areas of
current interest.
And we'll get into them as we go beyond
looking to listening, learning,
connecting, and predicting.
So see you next week.