Let's return to technology now and ask
about search in a different, private
context, such as searching one's own
desktop, one's own email, and other
private data sets.
Clearly indexing the way we described it
in the very beginning of this course will
work fine.
But what about relevance.
Normally we don't have links between
different documents on our Desktop or
hyper-linked emails, so we can't directly
[inaudible] track.
And we need to use other associations.
For example, we need to link documents
that talk about the same people of the
same places or we might use relevance
feedback by tracking our own behavior to
see which documents we actually use in
response to a bunch of source results,
very similar to our page rank is being
improved by our own use of search
everyday.
But there are even more problems with
private data.
Most of the time, each document has
multiply versions, or different formant
for the same documents like power point
and pdf's.
And, many versions of the same document as
it undergoes editing.
So detecting duplicates and handling them
appropriately is very important.
Lastly, is search the only paradigm for
finding stuff?
And this take us to.
Areas such as, topic mining.
Activity mining, and contextual
suggestions.
We'll return to some of these advanced
topics very soon.
But before that let's, make things even
more difficult.
And talk about.
Data bases which are used in large
enterprises.
And enterprise search.
Using such databases, as well as.
A lot of unstructured, textual data.
Enterprise search poses all the challenges
of private search that we discussed on the
previous chart.
And more.
For example, the results of a search could
depend on the context in which somebody is
forming that search.
And people play multiple roles in an
organization.
Sometimes, I'm acting as a researcher.
Sometimes as a teacher.
Sometimes as an executive and so on.
Next.
How do you classify.
Large sets of documents?
Each one of us.
Faces challenges classifying our own
documents.
On our desktops.
The problem becomes even more.
Complicated when you have to classify
documents.
Used by 100's or 1000's of people.
What kind of classification works?
Should it be manually done, by a central
team?
Or can it be done automatically?
Can you have many different
classifications depending on.
How you want to view.
A whole bunch of documents?
What about security?
Not everybody's allowed to access every
document, or every piece of data in an
organization.
Some things are secret, and some highly
secret.
And lastly, what about structured data?
The kind that's found in databases.
Unfortunately, sequel is not the answer.
For example, text inside structured
records is not easily searched using
sequel, as we'll explain shortly.
Next, linking unstructured documents to
structured documents is also important,
and not possible easily.
Finally just searching structured records
and getting a list of related records
grouped together as objects is a huge
challenge, which is simply not been
satisfactory resolved yet and that's what
we'll talk about in our next example.