If an old persistent subscription is recreated but then immediately
destroyed because it is out of date, the subscription tree will have no
leaf subscriptions on it. This was resulting in a crash when attempting
to destroy the subscription tree.
A simple NULL check fixes this problem.
Change-Id: I85570b9e2bcc7260a3fe0ad85904b2a9bf36d2ac
There have been crashes and general instability seen in the pubsub code,
so this patch introduces three changes to increase the stability.
First, the ownership model for subscriptions has been modified. Due to
RLS, subscriptions are stored in memory as a tree structure. Prior to my
patch, the PJSIP subscription was the owner of the subscription tree.
When the PJSIP subscription told us that it was terminating, we started
destroying the subscription tree along with all of the individual leaf
subscriptions that belong to the tree. The problem with this model is
that the two actors in play here, the PJSIP subscription and the
individual leaf subscriptions, need to have joint ownership of the
subscription tree. So now, the PJSIP subscription and the individual
leaf subscriptions each have a reference to the subscription tree. This
way, we will not actually free memory until no players are left that
care. The PJSIP subscription is a bigger stakeholder, in that if the
PJSIP subscription's reference to the subscription tree is removed, the
subscription tree instructs the leaf subscriptions to shut down and drop
their references to the subscription tree when possible. The individual
leaf subscriptions, upon being told to shut down, can drop their stasis
subscriptions or whatever they use to learn of new state, and then drop
their reference to the subscription tree once they are ready to die.
Second, the lifetime of a PJSIP subscription's reference to our
subscription tree has been altered. As I learned from doing a deep dive,
the PJSIP evsub code can tell Asterisk multiple times that the
subscription has been terminated, and not all of these times
are especially helpful. I have altered the message flow that we use for
SIP subscriptions such that we will always drop the PJSIP subscription's
reference to the subscription tree when we send the NOTIFY that
terminates a SIP subscription. This also means that we will now queue
NOTIFY requests to be sent after responding to incoming SUBSCRIBEs so
that we can have predictable state changes from the PJSIP evsub code.
Third, the synchronization of operations has been improved. PJSIP can
call into our code from a serializer thread (e.g. upon receiving an
incoming request) or from the monitor thread (e.g. when a subscription
times out). Because of this, there is the possibility of competing
threads stepping on each other. PJSIP attempts to do some
synchronization on its own by always keeping the dialog lock held when
it calls into us. However, since we end up pushing tasks into the
serializer, the result was that serialized operations were not grabbing
the dialog lock and could, as a result, step on something that was being
attempted by a different thread. Now we ensure that serialized
operations grab the dialog lock, then check for extenuating
circumstances, then proceed with their operation if they can.
Change-Id: Iff2990c40178dad9cc5f6a5c7f76932ec644b2e5
Users of functions which call __ast_str_helper() such as the ones listed
below are likely to not check the return value for failure so ensuring
that the string is always nil terminated is a good safety measure.
ast_str_set_va()
ast_str_append_va()
ast_str_set()
ast_str_append()
Change-Id: I36ab2d14bb6015868b49329dda8639d70fbcae07
In a realtime based system with a limited number of threadpool threads
it is possible for a deadlock to occur. This happens when permanent
endpoint state is updated, which will cause database queries to be done.
These queries may result in URI validation being done which is done
synchronously using a PJSIP thread. If all PJSIP threads are in use
processing traffic they themselves may be blocked waiting to get the
permanent endpoint container lock when identifying an endpoint.
This change moves URI validation to occur at use time instead of
configuration time. While this comes at a cost of not seeing a problem
until you use it it does solve the underlying deadlock problem.
ASTERISK-25486 #close
Change-Id: I2d7d167af987d23b3e8199e4a68f3359eba4c76a
In September 2006, the maximum packetization time (ptime) were set to such a
low value, packetization was disabled for many codecs actually. This was fixed
for many codecs but not for iLBC 30. This enables packetization for iLBC which
can be enabled for example via allow=ilbc:60,gsm,alaw,ulaw in the file sip.conf.
ASTERISK-7803
Change-Id: I2ef90023d35efb7cb8fe96ed74f53f6846ffad12
With Asterisk 13, the structures ast_format and ast_codec changed. Because of
that, the paketization timing (framing) of the RTP channel moved away from the
formats/codecs. In the course of that change, the ptime of the callee was not
honored anymore, when the optional autoframing was enabled.
ASTERISK-25484 #close
Change-Id: Ic600ccaa125e705922f89c72212c698215d239b4
Error response code descriptions may contain wiki markup that need to be
escaped. Without this patch, Confluence will reject the document being sent
and the responsible script will raise an exception.
Change-Id: I21fcb66fee7f6332381f2b99b1b0195dff215ee5
When ab803ec342 was committed, it accidentally forgot to actually *add* the
HOLD_INTERCEPT function. This highlights two interesting points:
* Gerrit forces you to put the patch as it is going to into the repo up for
review, which Review Board did not. Yay Gerrit.
* No one apparently bothered to use this feature, or else they don't know about
it. I'm going to go with the latter explanation.
ASTERISK-24922
Change-Id: Ida38278f259dd07c334a36f9b7d5475b5db72396
This patch adds some minor tweaks for autosupport to update it for Asterisk 13.
This includes:
* Finally removing most references to Zaptel
* Adding support for some additional 'core' commands, and fixing nomenclature
that generally hasn't been used for some time
* Adding some PJSIP/SIP commands to gather endpoints/peers and active channels
Change-Id: Ic997b418cbd9313588b6608e50f47b0ce6f4f1f1
On v13, loading several thousand PJSIP endpoints on Asterisk start causes
a deadlock most of the time.
Thanks to mdu113 for discovering that there was a call to pgsql_exec() not
protected by the pgsql_lock reentrancy lock.
{quote}
I believe a code path exists that attempts to use pgsql connection without
locking pgsql_lock. I believe what happens during that deadlock that I
see is two concurrent threads are both attempting to send query to pgsql,
one of the thread is using a code path without locking pgsql_lock. If
they managed to send queries at the same time, it seems postgres ignores
one of the queries and replies only to the one of them. If it happens so
that the thread holding the lock didn't receive the reply it will wait for
it (and hold the lock) forever (or at least for very long time), thus
completely blocking all access to db.
{quote}
* Added missing reentrancy locking around pgsql_exec() in find_table().
* Moved unlock of pgsql_lock in unload_module() to avoid locking inversion
between the psql_tables list lock and the pgsql_lock.
ASTERISK-25455 #close
Reported by: mdu113
Patches:
res_config_pgsql.c-connlock2.diff (license #5543) patch uploaded by mdu113
Change-Id: Id9e7cdf8a3b65ff19964b0cf942ace567938c4e2
To quote Olle:
"When issuing a hangup due to RTP timeouts the cause code is not set. I have
selected 44 based on Cisco's implementation..."
ASTERISK-25135 #close
Reported by: Olle Johansson
patches:
rtp-timeout-cause-1.8.diff uploaded by Olle Johansson (License 5267)
Change-Id: Ia62100c55077d77901caee0bcae299f8dc7375fc
The memory corruption could happen if the [section](+) is the last section
in the file with trailing comments. In this case process_text_line() has
left *last_cat is set to newcat and newcat is destroyed.
Change-Id: I0d1d999f553986f591becd000e7cc6ddfb978d93
An #include right after a [section](+) would associate any variable
assignments before a new section in the #include with the wrong section.
* Fix section association by setting the current section to the appended
section.
* Fix '+' and '!' section flag interaction corner case depending upon
which flag came first. If the '!' came first then it would be ignored.
If the '!' came after then it would affect the appended section. The '!'
will now no longer be ignored.
ASTERISK-25461 #close
Reported by: Sean Pimental
Change-Id: Ic9d3191c8758048e2cbce6432f854b32531731c3
The struct send_request_wrapper has a pjsip lock associated with it that
is created non-recursive. There is a code path for the struct
send_request_wrapper lock that will attempt to lock it recursively. The
reporter's deadlock showed that the thread calling endpt_send_request()
deadlocked itself right after the wrapper object got created.
Out-of-dialog requests such as MESSAGE, qualify OPTIONS, and unsolicited
MWI NOTIFY messages can hit this deadlock.
* Replaced the struct send_request_wrapper pjsip lock with the mutex lock
that can come with an ao2 object since all of Asterisk's mutexes are
recursive. Benefits include removal of code maintaining the pjsip
non-recursive lock since ao2 objects already know how to maintain their
own lock and the lock will show up in the CLI "core show locks" output.
ASTERISK-25435 #close
Reported by: Dmitriy Serov
Change-Id: I458e131dd1b9816f9e963f796c54136e9e84322d
In ast_rtp_read, the value of the variable 'mark' which we try to assign to a
frame->subclass.frame_ending may be 0, 1 or (1<<23), but we should translate
it to 0 or 1.
ASTERISK-25451 #close
Change-Id: I53bdf5c026041730184a6a809009c028549ce626
Return AST_PRESENCE_NOT_SET when CustomPresence AstDB key does not
exist, i.e. when a new CustomPresence is added in the dialplan.
ASTERISK-25400 #close
Reported by: Andrew Nagy
Change-Id: I6fb17b16591b5a55fbffe96f3994ec26b1b1723a
When we decide we will no longer schedule an RTCP write, we remove the
reference to the RTP instance, then assign -1 to the stored scheduler ID
in case something else comes along and wants to see if anything is scheduled.
That scheduler ID is on the RTP instance. After 60a9172d7e was merged to
fix the regression introduced by 3cf0f29310, this improper assignment on a
potentially destroyed object started getting tripped on the build agents.
Frankly, this should have been crashing a lot more often earlier. I can only
assume that the timing was changed just enough by both changes to start
actually hitting this problem.
As it is, simply moving the assignment prior to the ao2 deference is sufficient
to keep the RTP instance from being referenced when it is very, truly,
aboslutely dead.
(Note that it is still good practice to assign -1 to the scheduler ID when we
know we won't be scheduling it again, as the ao2 deref *may* not always destroy
the ao2 object.)
ASTERISK-25449
Change-Id: Ie6d3cb4adc7b1a6c078b1c38c19fc84cf787cda7
If a Via header containes an IPv6 address and a port number is ommitted,
as it is the standard port, we now leave the port empty and to not set it
to the value after the first colon of the IPv6 address.
ASTERISK-25443 #close
Change-Id: Ie3c2f05471cd006bf04ed15598589c09577b1e70
Apparently some endpoints attempt to send a reINVITE before completing the
initial INVITE transaction. In this case PJSIP responds appropriately to
the reINVITE with a 491 INVITE request pending. Unfortunately chan_pjsip
is using the initial INVITE transaction state to determine if an INVITE is
the initial INVITE or a reINVITE. Since the initial INVITE transaction
has not been confirmed yet chan_pjsip thinks the reINVITE is an initial
INVITE and starts another PBX thread on the channel. The extra PBX thread
ensures that hilarity ensues.
* Fix checks for a reINVITE on incoming requests to look for the presence
of a to-tag instead of the initial INVITE transaction state.
* Made caller_id_incoming_request() determine what to do if there is a
channel on the session or not. After a channel is created it is too late
to just store the new party id on the session because the session's party
id has already been copied to the channel's caller id.
ASTERISK-25404 #close
Reported by: Chet Stevens
Change-Id: Ie78201c304a2b13226f3a4ce59908beecc2c68be
When 5c713fdf18 was merged, it allowed for scheduled items to have an ID of
'0' returned. While this was valid per the documentation for the API, it was
apparently never returned previously. As a result, several users of the
scheduler API viewed the result as being invalid, causing them to reschedule
already scheduled items or otherwise fail in interesting ways.
This patch corrects the users such that they view '0' as valid, and a returned
ID of -1 as being invalid.
Note that the failing HEP RTCP tests now pass with this patch. These tests
failed due to a duplicate scheduling of the RTCP transmissions.
ASTERISK-25449 #close
Change-Id: I019a9aa8b6997584f66876331675981ac9e07e39
Some systems require the REFER packet to include a Referred-By header.
If the channel variable SIPREFERREDBYHDR is set, it passes that value as the
Referred-By header value. Otherwise, it adds the current dialog’s local info.
Reported by: Dan Cropp
Tested by: Dan Cropp
Change-Id: I3d17912ce548667edf53cb549e88a25475eda245
When GetConfigJSON was introduced back in 1.6, it returned each
section as an array of strings: ["key=value", "key2=value2"].
Afterwards, it was changed a few times and became
["key": "value", "key2": "value2"], which is not a correct JSON.
This patch fixes that by constructing a JSON object {} instead of
an array [].
Also, the keys "istemplate" and "tempates" that are used to
indicate templates and their inherited categories are now wrapped in
quotes.
ASTERISK-25391 #close
Reported by: Bojan Nemčić
Change-Id: Ibbe93c6a227dff14d4a54b0d152341857bcf6ad8
A deadlock can happen when a sorcery object is being expired from the
memory cache when at the same time another object is being placed into the
memory cache. There are a couple other variations on this theme that
could cause the deadlock. Basically if an object is being expired from
the sorcery memory cache at the same time as another thread tries to
update the next object expiration timer the deadlock can happen.
* Add a deadlock avoidance loop in expire_objects_from_cache() to check if
someone is trying to remove the scheduler callback from the scheduler.
ASTERISK-25441 #close
Change-Id: Iec7b0bdb81a72b39477727b1535b2539ad0cf4dc
Make sorcery_memory_cache_close() call remove_all_from_cache() instead of
partially inlining it.
ASTERISK-25441
Change-Id: I1aa6cb425b1a4307096f3f914d17af8ec179a74c
Basically you should shutdown in the opposite order of how you setup since
later setup pieces likely depend on earlier setup pieces. e.g.,
Registering your external API with the rest of the system should be the
last thing setup and the first thing unregistered during shutdown.
Change-Id: I5715765b723100c8d3c2642e9e72cc7ad5ad115e
In practice the set_role API callback can be invoked even
when no ICE is present on an RTP instance. This can occur
if ICE has not been enabled on it.
ASTERISK-25438 #close
Change-Id: I0e17e4316f0f0d7f095c78c3d4fd73a913b6ba69
* Now conf_alloc() has more off nominal error checking.
* Eliminated RAII_VAR() use in conf_alloc().
* Eliminated a dubius shortcut when destroying cfg->general in
conf_destructor() that would cause a crash if cfg->general failed to get
allocated.
* Add some ACO registration section comments.
Change-Id: Ia40c2b1b2d0777d641605118ae019c5a73865e1a