{
localUrl: '../page/8r4.html',
arbitalUrl: 'https://arbital.com/p/8r4',
rawJsonUrl: '../raw/8r4.json',
likeableId: '4080',
likeableType: 'page',
myLikeValue: '0',
likeCount: '1',
dislikeCount: '0',
likeScore: '1',
individualLikes: [
'AltoClef'
],
pageId: '8r4',
edit: '2',
editSummary: '',
prevEdit: '1',
currentEdit: '2',
wasPublished: 'true',
type: 'wiki',
title: 'Difference Between Weights and Biases: Another way of Looking at Forward Propagation',
clickbait: 'My understanding on Forward Propagation',
textLength: '3823',
alias: '8r4',
externalUrl: '',
sortChildrenBy: 'likes',
hasVote: 'false',
voteType: '',
votesAnonymous: 'false',
editCreatorId: 'AltoClef',
editCreatedAt: '2017-10-15 10:37:37',
pageCreatorId: 'AltoClef',
pageCreatedAt: '2017-10-15 09:15:12',
seeDomainId: '0',
editDomainId: '2835',
submitToDomainId: '0',
isAutosave: 'false',
isSnapshot: 'false',
isLiveEdit: 'true',
isMinorEdit: 'false',
indirectTeacher: 'false',
todoCount: '2',
isEditorComment: 'false',
isApprovedComment: 'false',
isResolved: 'false',
snapshotText: '',
anchorContext: '',
anchorText: '',
anchorOffset: '0',
mergedInto: '',
isDeleted: 'false',
viewCount: '17',
text: '## What are Weights and Biases\n\nConsider the following forward propagation algorithm:\n$$\n\\vec{y_{n}}=\\mathbf{W_n}^T \\times \\vec{y_{n-1}} + \\vec{b_n}\n$$\nwhere $n$ is the number of the layers, $\\vec{y_n}$ is the output of the $n^{th}$ layer, expressed as a $l_n \\times 1$ ($l_n$ is the number of neurons of the $n^th$ layer) vector. $\\mathbf{W_n}$ is a $l_{n-1} \\times l_{n}$ matrix storing all the weights of every connection between layer $n$ and $n-1$, thus needing to be transposed for the sake of the product. $\\vec{b_n}$, again, is the biases of the connections between the $n^th$ and $(n-1)^th$ layers, in the shape of $l_n\\times1$.\n\nAs one can see, both weights and biases are just changeable and derivable(thus trainable) factors that contributes to the final results.\n\n## Why do we need both of them, and why are Biases Optional?\n\nNeural network, indeed a better version of the perceptron model, where the output of each neuron(perceptron) owns a linear correlation with the output, rather than simply outputting plain 0/1. (This relation is further more projected to the activation function to make it non-linear, which will be discussed later) \n\nTo create a linear correlation, the easiest way is to scale the input with a certain coefficient $w$, output the scaled input. \n$$\nf(x)=w\\times x\n$$\n\nThis model works alright, even with one neuron it could perfectly fit a linear function like $f(x)=m\\times x$, and certain non-linear relations could be fit with neurons work in layers. \n\nHowever, this new neuron without biases, lack of a significant ability even comparing to perceptron: it always fires regardless the input thus failing to fit functions like $y=mx+b$. It's impossible to disable the output of a specific neuron on certain threshold value of the input. Even that adding more layers and neurons a lot eases and hides this issue, neural networks without biases are likely to perform a worse job than those with biases.(Consider the total layers/neurons are the same)\n\nIn conclusion, the biases are supplements to the weights to help a network better fit the pattern, which are not necessary but helps the network to perform better. \n\n## Another way of writing the Forward Propagation\n\nInterestingly, the forward propagation algorithm \n$$\n\\vec{y_{n}}=\\mathbf{W_n}^T \\times \\vec{y_{n-1}} + 1 \\times \\vec{b_n}\n$$\ncould also be written like this:\n$$\n\\vec{y_{n}}=\n\\left[ \\begin{array}{c}\n x, \\\\ 1\n\\end{array} \\right]^T\n\\cdot\n\\left[ \\begin{array}{c}\n \\mathbf{W_n},\n \\\\ \\vec{b_n}\n\\end{array} \\right]\n$$,which is\n$$\n\\vec{y_{n}} = \\vec{y_{new_{n-1}}}^T \\times \\vec{W_{new}} \n$$.\nThis is a way of rewriting the equation makes the adjustment by gradient really easy to write.\n\n## How to update them?\n\nIt's super easy after the rewrite:\n$$\n\\vec{W_{new}} =\\vec{W_{new}}-\\frac{\\delta W_{new}}{\\delta Error}\n$$.\n\n## The Activation Function\n\nThere is one more compoment yet to be mentioned--the Activation Function. It's basically a function takes the output of a neuron as an input and output whatever value defined as the final output of the neuron.\n$$\n\\vec{W_{new}} =Activation(\\vec{W_{new}}-\\frac{\\delta W_{new}}{\\delta Error})\n$$\nThere are copious types of them around, but all of them have at least one shared property that there are all *Non-linear*! \n\nThat's basically what they are designed for. Activation Functions project output to a non-linear function, thus introducing non-linearity into the model. \n\nConsider non-linear-seperatable problems like the the XOR problem, giving the network the ability to draw non-linear sperators may help the classification.\n\nAlso, there's another purpose of the activation function, which is to project a huge input, into the space between -1 and 1, thus making the followed-up calculations easier and faster. \n\n2017/10/15',
metaText: '',
isTextLoaded: 'true',
isSubscribedToDiscussion: 'false',
isSubscribedToUser: 'false',
isSubscribedAsMaintainer: 'false',
discussionSubscriberCount: '1',
maintainerCount: '1',
userSubscriberCount: '0',
lastVisit: '',
hasDraft: 'false',
votes: [],
voteSummary: 'null',
muVoteSummary: '0',
voteScaling: '0',
currentUserVote: '-2',
voteCount: '0',
lockedVoteType: '',
maxEditEver: '0',
redLinkCount: '0',
lockedBy: '',
lockedUntil: '',
nextPageId: '',
prevPageId: '',
usedAsMastery: 'false',
proposalEditNum: '0',
permissions: {
edit: {
has: 'false',
reason: 'You don't have domain permission to edit this page'
},
proposeEdit: {
has: 'true',
reason: ''
},
delete: {
has: 'false',
reason: 'You don't have domain permission to delete this page'
},
comment: {
has: 'false',
reason: 'You can't comment in this domain because you are not a member'
},
proposeComment: {
has: 'true',
reason: ''
}
},
summaries: {},
creatorIds: [
'AltoClef'
],
childIds: [],
parentIds: [],
commentIds: [],
questionIds: [],
tagIds: [],
relatedIds: [],
markIds: [],
explanations: [],
learnMore: [],
requirements: [],
subjects: [],
lenses: [],
lensParentId: '',
pathPages: [],
learnMoreTaughtMap: {},
learnMoreCoveredMap: {},
learnMoreRequiredMap: {},
editHistory: {},
domainSubmissions: {},
answers: [],
answerCount: '0',
commentCount: '0',
newCommentCount: '0',
linkedMarkCount: '0',
changeLogs: [
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '22834',
pageId: '8r4',
userId: 'AltoClef',
edit: '2',
type: 'newEdit',
createdAt: '2017-10-15 10:37:37',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '22833',
pageId: '8r4',
userId: 'AltoClef',
edit: '1',
type: 'newEdit',
createdAt: '2017-10-15 09:15:12',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
}
],
feedSubmissions: [],
searchStrings: {},
hasChildren: 'false',
hasParents: 'false',
redAliases: {},
improvementTagIds: [],
nonMetaTagIds: [],
todos: [],
slowDownMap: 'null',
speedUpMap: 'null',
arcPageIds: 'null',
contentRequests: {}
}