This video talks about data
modeling and UML, the Unified Modeling Language.
The area of data modeling
consists of how we represent
the data for an application.
We've talked a great length about the relational data model.
Its widely used and we
have good design principles for coming up with relational schemas.
We also talked about XML as
a data model, XML is quite
a bit newer and there are
no design principles that are
analogous to the ones for the relational model.
But frequently when people are
designing a database, they'll actually
use a higher level model
that's specifically for database design.
These models aren't implemented by
the database system, rather they're
translated into the model of the database system.
So let's draw a picture of that.
Let's suppose that we have
a relational database management system
which is abbreviated RDBMS often, and
I'll draw that as a disk just out of tradition.
So, if we create a database
in a relational system the
database is going to consist of relations.
but instead of designing relations
directly, the database designer,
we'll draw that up here, will
use instead a higher-level design model.
That model will then go
through a translator, and this
can often be an automatic
process that will translate the
higher level model into the
relations that are implemented by the database system.
So what are these higher-level models?
Historically, for decades in
fact, the entity relationship
model, also known as the
ER model, was a very popular one.
But more recently the unified
modeling language has become popular
for higher-level database design.
The unified modeling language is
actually a very large language,
not just for database designs, but also for designing programs.
So what we're going to look
at is the data modeling subset of UML.
Both of these design models are
fundamentally graphical, so in
designing a database, the user
will draw boxes and arrows, perhaps other shapes.
And also both of them
can be translated, generally automatically, into relations.
Sometimes there may be little human
intervention in the translation process, but often that's not necessary.
So in the data modeling subset of
UML, there are five basic concepts.
Classes, associations, association classes, sub-classes, and composition and aggregation.
We're just going to go
through each one of those
concepts in turn with examples.
So that class concept in UML
is not specific to data-modeling.
It's also used for designing programs.
The class consists of a
name for the class, attributes of
the class, and methods in the
class, and that's probably familiar to you again from programming.
For data modeling specifically, we
add to the attributes the
concept of a primary key,
and we drop the methods
that are associated since we're focusing,
really, on the data modeling at this point.
So we'll be drawing our examples,
as usual, from a imaginary
college admissions database with
students and colleges and students applying to colleges and so forth.
So one of our classes, not
surprisingly, will be the student class.
And in UML we'll draw a
class as a box
like this, and at the
top we put the name
of the class and then we
put the attributes of the class,
so let's suppose that we'll just keep it simple.
We'll have a student ID, a
student name, and for
now, the student's GPA and
down here in UML would
be the specification of the methods.
Again we're not going to
be focusing on methods since we
are looking at data-modeling,and not the operations on the data.
And so one difference is that we'll have no methods.
Another is that we specify
a primary key if we
wish and that's specified
using the terminology PK.
So we'll say that the student ID in this case is the primary key.
And just as in keys in
the relational model, that means
that when we have a set
of objects for the student
class, each object will have a unique student ID.
There will be no student IDs repeated across objects.
in our college application database, we're
also likely to have a
class for colleges, so we'll have a class that we call college.
And for now, we'll make
the attributes of that
class, just the college name and the state.
And again in full UML, there might be some methods down here.
And we'll make the college
name and this case be the primary key.
So we're assuming now that college names themselves are unique.
So that's it for classes.
Pretty straightforward, they look a
lot like relations and of
course, they will translate directly to relations.
Next let's talk about associations.
Associations capture relationships between objects of two different classes.
So lets suppose again that
we have our student class and
I won't write the attributes now,
I'll just write it like that
and we have our college class
in our UML design.
If we want to have a
relationship that students apply
to colleges, we write that
just as a line between
the students and the college classes
and then we give it a name.
So we'll call it applied
and that says that we have
objects in the student class and
objects that are in the college class
that are associated with each
other through the applied association.
If we want to introduce a
directionality to the relationship,
so to say that student are
applying to colleges, we can
put in a arrow there,
that's part of the UML language
although we'll see that it doesn't
really make much difference when we
end up translating UML designs to relations.
When we have associations between classes,
we can specify what we call
the multiplicity of those and
that talks about how many objects
of one class can be related
to an object of another class.
So we'll see that we
can capture concepts like one-one
and many-one and so forth.
So let's look specifically at
how we specify those in
a UML diagram, and for
now I'll just use two generic classes.
So let's say I have a
class C1 and I
have a class C2, and let's
say that I have an association
between those two classes, so that would be a line.
And I could give that a name,
let's call it A.  Let's say
that I want to specify that
each object in Class C,
well I'm just going to write those
objects kind of as dots here below the class specification.
Let's say that I
wanted to say that each one
of those is going to
be related to at least
M but at most
N objects in class
C2, so here are class C2 objects.
I'm going to have this kind of fan out in my relationship.
To specify that in the
UML diagram I write that as M..
and on the right side
of the association line and
again that's say each object
then in C1, then will related
to between M and N objects of C2.
Now there are some special cases in this notation.
I can write M dot dot
star, and star means
any number of objects, so
what that would see is
that each object in "C1"
is related to atleast "M"
and, as many as it wants, elements of "C2".
I can also write zero to
end and that will
say that each object in C1
is related to possibly none
for example here we have one
that I haven't draw any relations tips.
Possibly none and up to N elements of C2.
I can also write zero dot
dot star, and that's basic
no restrictions on the multiplicity.
And just to mention,
the default, actually, is one dot dot one.
So if we don't write anything
on our association we're
assuming that each object is
related to exactly one object
of the other class and that's in
both directions by the way,
so I can put a X..
Y here and now we'll
restrict how many objects of
element of C2 is related to.
Incidentally UML allow some abbreviations, 1..1
can be abbreviated as a just
plain old one and 0..
can be abbreviated with just star.
So let's take a look at
our student and college example and
what the multiplicity of the association
of students applying to colleges might be.
So let's suppose that we
insist that students must apply
somewhere, so they apply to at
least one college but they're
not allow to apply to more
than 5 and further more
lets say that no college will
take more than 20,000 applications, so
this example is contrived to
allow me to put multiplicity specifications on both sides.
So again, we'll have our
student class and we'll
have our college class
and we'll have our association
between the student and the
college class, and I'll just write the name underneath here.
Now applied.
So lets think about how
to specify our multiplicities for this.
So to specify that a student
must apply somewhere but cannot
apply to more than 5
colleges, we put a one
dot dot five on this side.
It really takes some thinking sometimes
to remember which side to put the specification on.
But that's what gives us the
fan out from the objects
on the left to the objects on the right.
So it says each student can
apply to up to five
colleges and must apply
to at least one, so we
won't have any who haven't applied anywhere.
On the other side, we want
to talk about how many students
can have applied to a particular
college, and we said it can be no more than 20,000.
We didn't put a lower
restriction on that, so we
would specify that as 0 to 20,000.
So I mentioned earlier that multiplicity
of associations captures some of
these types of relationships you might
have learned about somewhere else called
one to one, many to one, and so on.
So, let me show the relationship
between association multiplicity and this terminology.
So if we have a one-to-one relationship
between "C1" and "C2," technically one-to-one
doesn't mean everything has to be involved.
What it really means is that
each object on each side
is related to at most one on the other side.
So to say it's a one-to-one relationship
we would put a "zero, dot,
dot, one" on both sides.
Let's see if I can use some colors here.
So what about many-to-one?
Many-to-one says that we can have
many elements of "C1" related
to an element of "C2," but
each element of "C2" will
be related to, at most, one element of "C1."
So in that case we still
have a "zero, dot, dot, one"
on the right side indicating that
each "C1" object is related
to at most one object of
"C2" but we have the
star on the left hand
side indicating that C2 objects
can be related to any number
of "C1" objects and, as
a reminder, star is an abbreviation
for "zero, dot, dot, star."
Many to many has no
restrictions on the relationships.
So that would be a star on both sides.
Pretty simple and the last
concept is the idea of complete relationships.
So a complete relationship is complementary to these others.
It says that every object
must participate in the relationship.
So we can have a complete one
to one, and that would
be one dot dot one on both sides.
We could have a complete many to
one, and that would
be on the left side
one dot dot star, and
on the right side one dot dot
one and, finally, a complete
many to many would be
one dot dot star on each side.
As a reminder, the default if
we don't specify the multiplicity
is a one dot dot one both sides.
So that would be a complete
one to one relationship.
Ok, we've finished with classes and with associations.
Now let's talk about association classes.
Association classes generalize the
notion of associations by allowing
us to put attributes on the
association itself and, again, we'll use our example.
So we already knew how to
specify that students apply to
colleges, but what if associated
with the application we wanted
to have, for example, the date
that they applied and maybe the decision of that application.
We don't really have a way
to do that without adding a
new construct, and that construct
is what's known as an association class.
So we can make a class and we'll just call it "App Info".
And it looks like a class,
it's got the box with the name at the top and the attributes.
And then we just attach that
box to the association,
and that tells us
that each instance of
the association between a student
and a college has additional information,
a date of that application and the decision of that application.
Now there's a couple of things I want to mention.
First of all, in a number
of examples, I'll probably leave
out the multiplicities on the ends of the associations.
That doesn't mean I'm assuming the default one one.
It's just when it's not relevant, I'm not going to focus on that aspect.
Now when we have students associated with colleges.
So we have a student here we have a college.
Then we have an association between those.
Now what we're saying is that
association is going to
have affiliated with it a date and a decision.
What we cannot describe in
UML is the possibility
of having more than one
relationship or association between the
same student and the same college.
So when we have an association
that assumes at most one
relationship between two objects.
So, for example, if we
wanted to add the possibility
that students could apply to
the same college multiple times so
maybe you know that want
to apply for separate majors.
That would actually have to be captured quite differently.
We'd have to add a separate
class that would for the
application information with separate
relationships to the students and colleges.
So this is a, in my
mind, a slight deficiency of UML.
Again, that and it only
captures, at most, one relationship
between the two specific
objects across the two classes.
Now, sometimes we can make
a design that has an association
class and it turns out
we didn't really need it and
we're going to come back to
multiplicities to see how this
can happen, so again let's
take a look at just generic classes C1 and C2.
Let's say that we have an
association between them and then we have an association class.
We'll just call it AC.
And that's gonna have some
attributes, we can call them A1 and A2 for now.
And of course, there's attributes
in C1 and C2 as well.
Let's suppose that the multiplicity
on, let's say the left
side is star so anything
goes, and on the right
side we have one to one.
So what that multiplicity says is
that each object Of C1
is related to at most one object of C2.
So, actually exactly one object in this case.
So we know that there's
going to be just one association for
each object of C1, and
if there's only going to
be one association actually we
could take these attributes and we
could put those attributes as part
of C1 instead of having
a separate association class, so
for example If this class
happened to be the student class,
and this was the college class,
and we insisted that each
student apply to exactly one
college then the attributes
we had down here, the date
and decision, could be moved
into the student class, because we
know they're only applying to one
college, so that would be
the date and the decision for
the one college they're applying to.
Furthermore, if we had zero
dot dot one, we can
still move these attributes here
and, in that case, if a
student was not involved in a
college - had not applied
to a college at all or, more
generally, an object of "C1"
was not related to any
object of  "C2" then those
attributes would have the equivalent
of null values in them.
By the way, it is possible for
an association to be between a class and itself.
For example, we could have
our student class and maybe
we're going to have an association
called "sibling", a student
being associated with another student
because they're siblings, an association
between a class in itself
is written with a line tgat
just goes between the class and itself.
And then we could label that sibling.
And for multiplicities we can
assume that every student
has between 0 and an
arbitrary number of siblings lets
say, so we can put
a star on both ends of that association.
A more interesting association might
involve colleges where say
we have for every college
a flagship main campus.
But then some colleges have separate
branch or satellite campuses, so
that would be an association
between a college and itself
saying that one college is
a branch of another college.
Now let's think about the multiplicities here.
First of all, when we
have a self association, in UML
we're allowed to label the two ends of the association.
So I could, for example, say
on one end we have the
home campus.
And on another end we have the satellite campus.
And now with those labels
we can see the asymmetry and
that lets us get our associations right.
So let's say that every satellite
campus must have exactly one
home campus, so that would
be a one dot dot here
and every home campus can have
any number of satellite campuses.
Or actually, let's say something else.
Let's say every home campus
can have between zero and ten
satellite campuses be a
zero dot dot ten on that side of the self association.
Ok, we're finished with the first
three let's move on to sub classes.
For sub classes we're gonna do
a fairly large example that involves
students that we're gonna
separate into foreign students and domestic students.
We're also going to separately specify
students who have taken AP
classes and those will be our AP students.
So we're going to have the student
class as the top
of our hierarchy and the
student class will, again, have
the student ID, let's say
the student name, and GPA,
and we'll say the the student
ID is the primary key for
objects in that class, we're
going to have three sub classes,
one is going to be the
foreign students, we'll call it
foreign S, one is going
to be the domestic students and
then we're also going to
have a sub class for AP students.
and I'm going to assume that
you already know a little bit about sub classing from programming.
So the idea is that when
we have a sub class,
there are attributes that are
specific to the objects
that are in that sub class
and they'll inherit the attributes from their super class.
So we're gonna make student be a super class here.
And this is how we draw it,
with three sub classes here
for foreign student, domestic student, and AP student.
And we'll say that foreign students
have in addition to a
student ID, a student name and
GPA, a country that they come from.
We'll say that Domestic students
are going to have a state
that they come from and we'll
also say that they have
a Social Security number, which we
don't know that foreign students would necessarily have.
AP students, interestingly, is going to be empty.
It's not going to have any
additional attributes, but
the AP students are the
students that are going to
be allowed to have a
relationship with AP courses.
We'll say that the
AP course has a course number
and that's probably the primary key.
And maybe a title for the course and some units for the course.
And then when one of our AP students takes the course.
We'll call this "Association took".
We're going to have an association class
that goes along with that, that's
going to have the information,
let's called it "AP info", about
them taking that particular AP
class and we'll say that
association class has for
example the year that they
took the class and maybe the grade that they got in the class.
And lastly let's add some multiplicities.
Let's say that AP students
can take between one and
ten AP classes but they
taken at least one to
be an AP student and let's
say that every course has taken
by at least one student and
arbitrary number of students.
So this is one of the
biggest UML diagrams we've seen so far.
Again, this is a superclass up here.
And we have our subclasses down here.
And then we also have an
association, and an association class, and some multiplicities.
And again notice that
is ok that there are
no attributes in the AP
student sub class that sub
classes define as those student
who have taken AP course.
Here are some terminology and properties
associated with sub class relationships,
a super classes and UML
are sometimes called generalization with
sub classes called specialization and
some sub class relationship is said
to be complete if every
object in the super
class is in at least
one sub class and it's
incomplete if that's not
the case and incomplete is also
sometimes known as partial, a
sub class relationship is known
as disjoint if every object
is in at most one subclass.
In other words, we don't have any
objects that are in more than
one subclass, and that's sometimes called exclusive.
And if it's not disjoint, then
it's overlapping, meaning that objects
can be in multiple sub classes.
We can have any combination of these
pairs, so we can have
incomplete overlapping, or incomplete
disjoint, a complete disjoint
that are complete overlapping, lets take
a look back at our example,
for this example we will
probably have the case
that it's a complete subclass relationship.
In other words, every student is
in at least one subclass,
presumably every student is either
a foreign student or a domestic
student and further more,
we're going to say that
it's overlapping because we will
have students who, for example,
are both a domestic student and an AP student.
And in UML, the actual notation
is to put little curly braces here
to specify that that subclass
relationship is complete and overlapping.
To illustrate some of the
other cases, let's suppose that
we didn't have this whole section here with the AP students.
We only had foreign and domestic students.
In that case, we would say
that the subclass relationship is complete.
But in that case it would not be overlapping.
It would be disjoint.
Or suppose we didn't have this
whole left side here so all
we had was the AP student subclass.
In that case, it would
probably be an incomplete complete
subclass relationship because not everybody
is an AP student and
they wouldn't make any difference between
overlapping and disjoints since there
would be only one subclass in that case.
Okay we've now made it to
our last concept which is composition and aggregation.
Let me start by clarifying right off
that aggregation here has nothing
to do with aggregation in SQL.
Well, it's a completely different concept.
So let's first talk about composition.
Composition is used when we
have a database structure where
objects of one class kind
of belong to the objects
of another class and the
example I am going to use is colleges and departments.
So I've drawn the two classes here.
And let's say for the department
we have the department name and
we have say the building that the department is in.
And so we're assuming that
each college has a whole bunch
of departments, now we can
make a relationship, an association
between colleges and departments to
say that the department is in
a college but when we
have the idea that the
departments belong to a
specific college then that's
when this composition construct is used.
And the way the composition is written
is by putting a diamond over
here on the end of the association.
So composition is really a special type association.
And we'll fill in that diamond here to indicate composition.
Aggregation happens to have
an empty diamond which we'll see in
a moment so when
we have the diamond and
we're creating one of
these composition relationships there's
implicitly a one dot
dot one on the left side
so each department belongs to
one college but what's
kind of interesting here, what's
little different from the
normal relationship is that
we're not assuming that this
department name is a primary key exactly.
We could have this same department, in
fact even in the same building,
in different colleges and that
would be okay because a department
is through this relationship associated with it's college.
So that was composition, objects of
one class belonging to objects of another.
Let me give an example of aggregation.
This is a slight stretch but
what I'm going to make is a class of apartments.
Not departments but apartments.
So we're going to imagine that
there are apartment buildings
represented in our database, maybe
they have an address that the primary
key and something like the number
of units, and what we're
going to imagine is that
some apartment buildings are owned
by or associated with the
college but not all of them are.
And that's what aggregation does.
So for aggragation we again
have a relationship here, but
in this case, we make a
diamond on this side that
is open, and what that
says is that each apartment,
each object in the apartment
class is belonging to
a college either at
most one college or no college at all.
So we can have apartments that belong
to a college we can have,
kind of, free-floating apartments and that's
what the open diamond, which is aggregation, is about.
So in conclusion, the data
modeling portion of the
Unified Modeling Language can be
used to perform database design at a higher level.
It's a graphical language.
We went through the five main
concepts of the language, and also
very importantly UML designs can
be translated to relations automatically.
And that is the topic of the next video.